赞
踩
参考代码:https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/experimental/pipelines/examples/question-answering
可视化工具参考:https://streamlit.io/
ES安装参考官方下载:https://www.elastic.co/guide/en/enterprise-search/current/docker.html
ES安装教程参考:https://blog.csdn.net/smilehappiness/article/details/118466378
镜像:PaddlePaddle镜像汇总参考
anaconda环境运行:linux搭建用conda搭建虚拟环境并运行
requirements.txt内容参考2.1
# 1) 安装 pipelines package
cd ${HOME}/PaddleNLP/applications/experimental/pipelines/
pip install -r requirements.txt
python setup.py install
# 2) 安装 RestAPI 相关依赖
python ./rest_api/setup.py install
# 3) 安装 Streamlit WebUI 相关依赖
python ./ui/setup.py install
docker参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
安装包参考:https://www.elastic.co/cn/downloads/elasticsearch
ES安装教程参考:[https://blog.csdn.net/smilehappiness/article/details/118466378]
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.2-linux-x86_64.tar.gz
tar -xzvf elasticsearch-8.1.2-linux-x86_64.tar.gz
cd elasticsearch-8.1.2
./bin/elasticsearch
创建用户
useradd user-es
创建所属组:
chown user-es:user-es -R /data/mart/elasticsearch-8.1.2
切换到user-es用户
su user-es
进入bin目录
cd /data/mart/elasticsearch-8.1.2/bin
启动elasticsearch
./elasticsearch
curl http://193.168.57.39:9200/_aliases?pretty=true
查看
http://193.168.57.111:9200/baike_cities/_search?pretty
1.查看所有索引:curl ‘localhost:9200/_cat/indices?v’
2.查看test索引(后缀美化:?pretty):curl -XGET ‘localhost:9200/test/search?pretty’
3.查看test_下所有索引:curl -XGET 'localhost:9200/test/_search?pretty’
4.模糊查询(含有student)索引名称:curl 'localhost:9200/cat/indices?v’ | grep ‘student’
5.匹配ip等于哪个字段 只显示那个字段 curl -XPOST 'localhost:9200/test/_search?pretty’ -d’{“query”:{“match”:{“ip”:“2.2.2.2”}},“_source”:[“ip”,“name”]}’
6. must 相当于and(与)should相当于or: curl -XPOST ‘localhost:9200/test/_search?pretty’ -d ‘{“query”:{“bool”:{“must”:[{“term”:{“name”: “123”}},{“term”:{“type”: “3”}}]}}}’
7.时间范围以及时间排序取几条: curl -XGET ‘localhost:9200/test/_search?pretty’ -d ‘{“query”:{“range”: {“start_time”:{“lte”:1632363010000,“gte”:1610467210000 }}},“source”:[“id”,“name”,“score”,“type”,“start_time”],“sort”:[{“start_time”:{“order”:“asc”}}],“from”:1,“size”:4 }’
8.取两条数据:curl -XGET 'localhost:9200/test*/_search?pretty’ -d ‘{“from”:1,“size”:2 }’
/bin/sh: pdftotext: command not found
Traceback (most recent call last):
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/pipelines/base.py", line 832, in _load_or_get_component
component_type=component_type, **component_params)
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/nodes/base.py", line 67, in load_from_args
instance = subclass(**kwargs)
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/nodes/file_converter/pdf.py", line 66, in __init__
""")
Exception: pdftotext is not installed. It is part of xpdf or poppler-utils software suite.
Installation on Linux:
wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.03.tar.gz &&
tar -xvf xpdf-tools-linux-4.03.tar.gz && sudo cp xpdf-tools-linux-4.03/bin64/pdftotext /usr/local/bin
Installation on MacOS:
brew install xpdf
You can find more details here: https://www.xpdfreader.com
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "rest_api/application.py", line 33, in <module>
from rest_api.controller.router import router as api_router
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines_rest_api-0.0.1a0-py3.7.egg/rest_api/controller/router.py", line 17, in <module>
from rest_api.controller import file_upload, search, feedback, document
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines_rest_api-0.0.1a0-py3.7.egg/rest_api/controller/file_upload.py", line 59, in <module>
Path(PIPELINE_YAML_PATH), pipeline_name=INDEXING_PIPELINE_NAME)
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/pipelines/base.py", line 278, in load_from_yaml
overwrite_with_env_variables=overwrite_with_env_variables,
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/pipelines/base.py", line 789, in load_from_config
components=components)
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/pipelines/base.py", line 835, in _load_or_get_component
raise Exception(f"Failed loading pipeline component '{name}': {e}")
Exception: Failed loading pipeline component 'PDFFileConverter': pdftotext is not installed. It is part of xpdf or poppler-utils software suite.
Installation on Linux:
wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.03.tar.gz &&
tar -xvf xpdf-tools-linux-4.03.tar.gz && sudo cp xpdf-tools-linux-4.03/bin64/pdftotext /usr/local/bin
Installation on MacOS:
brew install xpdf
报错pdftotext安装地址:http://www.xpdfreader.com/download.html
wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.04.tar.gz && tar -xvf xpdf-tools-linux-4.04.tar.gz && sudo cp xpdf-tools-linux-4.04/bin64/pdftotext /usr/local/bin
pdftotext安装说明:https://github.com/jalan/pdftotext
sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel
pip install pdftotext
es/preprocessor/preprocessor.py", line 265, in split
language=self.language)
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
tokenizer = load(f"tokenizers/punkt/{language}.pickle")
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/nltk/data.py", line 750, in load
opened_resource = _open(resource_url)
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/nltk/data.py", line 876, in _open
return find(path_, path + [""]).open()
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/nltk/data.py", line 583, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
(python37) [root@k8s-master02 pipelines]# python
Python 3.7.0 (default, Oct 9 2018, 10:31:47)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('punkt')
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
True
把要上传的文件另存为utf-8的格式再上传
Traceback (most recent call last):
File "/data/anaconda3/envs/python37/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/data/anaconda3/envs/python37/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/streamlit-1.7.0-py3.7.egg/streamlit/__main__.py", line 21, in <module>
main(prog_name="streamlit")
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/streamlit-1.7.0-py3.7.egg/streamlit/cli.py", line 204, in main_run
_main_run(target, args, flag_options=kwargs)
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/streamlit-1.7.0-py3.7.egg/streamlit/cli.py", line 232, in _main_run
command_line = _get_command_line_as_string()
File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/streamlit-1.7.0-py3.7.egg/streamlit/cli.py", line 221, in _get_command_line_as_string
cmd_line_as_list.extend(click.get_os_args())
AttributeError: module 'click' has no attribute 'get_os_args'
参考解决方案:https://blog.csdn.net/qq_37223079/article/details/124315174
pip install click == 8.0.0
paddlenlp==2.3.3
paddleocr==2.5.0.3
requests==2.28.0
pydantic==1.9.1
mmh3==3.0.0
more-itertools==8.13.0
elasticsearch==7.10.0
SQLAlchemy==1.4.37
SQLAlchemy-Utils==0.38.2
langdetect==1.0.9
python-docx==0.8.11
nltk==3.7
pdfplumber==0.7.1
importlib-metadata==4.2.0
faiss-gpu==1.7.2
docker pull registry.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7-gcc54-dev
nvidia-docker run -it --entrypoint=/bin/bash registry.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7-gcc54-dev
pip3.7 install paddlepaddle-gpu==2.3.0.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
pip3.7 install -r requirements.txt
python3.7 setup.py install
python3.7 examples/question-answering/dense_qa_example.py --device gpu
FROM registry.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7-gcc54-dev
COPY . /deploy
WORKDIR /deploy
RUN pip config set global.index-url https://mirror.baidu.com/pypi/simple \
&& python3.7 -m pip install --upgrade setuptools \
&& python3.7 -m pip install --upgrade pip \
&& python3.7 -m pip install paddlepaddle-gpu==2.3.0.post101 \
-f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html \
# 1) 安装 pipelines package\
&& cd pipelines/ \
&& pip3.7 install -r requirements.txt \
&& python3.7 setup.py install
ENTRYPOINT export CUDA_VISIBLE_DEVICES=0 && \
python3.7 pipelines/examples/question-answering/dense_qa_example.py --device gpu
nvidia-docker build -t chatbot-qa:1.0.0.0630 .
nvidia-docker run --name chatbot-qa -d chatbot-qa:1.0.0.0630
nvidia-docker exec -it chatbot-qa /bin/bash
docker logs chatbot-qa
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。