问题一:NLTK 资源缺失
**********************************************************************
Resource wordnet not found.
Please use the NLTK Downloader to obtain the resource:>>> import nltk
>>> nltk.download('wordnet')
For more information see: https://www.nltk.org/data.htmlAttempted to load corpora/wordnet
Searched in:
- './nltk_data'
- '/root/nltk_data'
- '/home/lyx/RAG/ragflow/.venv/nltk_data'
- '/home/lyx/RAG/ragflow/.venv/share/nltk_data'
- '/home/lyx/RAG/ragflow/.venv/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
**********************************************************************
解决:
进入你的虚拟环境后执行:
python -m nltk.downloader wordnet omw-1.4 punkt问题二:updown_concat_xgb.model缺失
0.26.2版本中使用DeepDOC切片时会出现ragflow/rag/res/deepdoc/updown_concat_xgb.model": No such file or directory,原因是0.26.x版本已经不需要updown_concat_xgb.model,但代码没有更新
在ragflow/deepdoc/parser/pdf_parser.py中改为None即可
try: model_dir = os.path.join(get_project_base_directory(), "rag/res/deepdoc") # self.updown_cnt_mdl.load_model(os.path.join(model_dir, "updown_concat_xgb.model")) self.updown_cnt_mdl = None except Exception: model_dir = snapshot_download(repo_id="InfiniFlow/text_concat_xgb_v1.0", local_dir=os.path.join(get_project_base_directory(), "rag/res/deepdoc")) # self.updown_cnt_mdl.load_model(os.path.join(model_dir, "updown_concat_xgb.model")) self.updown_cnt_mdl = None