当前位置:   article > 正文

智能客服系统系列2-端到端智能问答系统_es python智能问答

es python智能问答

0、参考

参考代码:https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/experimental/pipelines/examples/question-answering
可视化工具参考:https://streamlit.io/
ES安装参考官方下载:https://www.elastic.co/guide/en/enterprise-search/current/docker.html
ES安装教程参考:https://blog.csdn.net/smilehappiness/article/details/118466378
镜像:PaddlePaddle镜像汇总参考

1、本地虚拟环境搭建运行

1.1 虚拟环境搭建

anaconda环境运行:linux搭建用conda搭建虚拟环境并运行

1.2 运行指令

requirements.txt内容参考2.1

# 1) 安装 pipelines package
cd ${HOME}/PaddleNLP/applications/experimental/pipelines/
pip install -r requirements.txt
python setup.py install
# 2) 安装 RestAPI 相关依赖
python ./rest_api/setup.py install
# 3) 安装 Streamlit WebUI 相关依赖
python ./ui/setup.py install
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

1.3 安装 ElasticSearch,用现有的ES集群,可以不用再安装

docker参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
安装包参考:https://www.elastic.co/cn/downloads/elasticsearch
ES安装教程参考:[https://blog.csdn.net/smilehappiness/article/details/118466378]

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.2-linux-x86_64.tar.gz
  • 1
tar -xzvf elasticsearch-8.1.2-linux-x86_64.tar.gz
  • 1
cd elasticsearch-8.1.2
  • 1
./bin/elasticsearch
  • 1
创建用户
useradd user-es

创建所属组:
chown user-es:user-es -R /data/mart/elasticsearch-8.1.2

切换到user-es用户
su user-es

进入bin目录
cd /data/mart/elasticsearch-8.1.2/bin

启动elasticsearch
./elasticsearch
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
curl http://193.168.57.39:9200/_aliases?pretty=true
  • 1

查看

http://193.168.57.111:9200/baike_cities/_search?pretty
  • 1

1.查看所有索引:curl ‘localhost:9200/_cat/indices?v’
2.查看test索引(后缀美化:?pretty):curl -XGET ‘localhost:9200/test/search?pretty’
3.查看test_下所有索引:curl -XGET 'localhost:9200/test/_search?pretty’
4.模糊查询(含有student)索引名称:curl 'localhost:9200/cat/indices?v’ | grep ‘student’
5.匹配ip等于哪个字段 只显示那个字段 curl -XPOST 'localhost:9200/test/_search?pretty’ -d’{“query”:{“match”:{“ip”:“2.2.2.2”}},“_source”:[“ip”,“name”]}’
6. must 相当于and(与)should相当于or: curl -XPOST ‘localhost:9200/test/_search?pretty’ -d ‘{“query”:{“bool”:{“must”:[{“term”:{“name”: “123”}},{“term”:{“type”: “3”}}]}}}’
7.时间范围以及时间排序取几条: curl -XGET ‘localhost:9200/test/_search?pretty’ -d ‘{“query”:{“range”: {“start_time”:{“lte”:1632363010000,“gte”:1610467210000 }}},“source”:[“id”,“name”,“score”,“type”,“start_time”],“sort”:[{“start_time”:{“order”:“asc”}}],“from”:1,“size”:4 }’
8.取两条数据:curl -XGET 'localhost:9200/test*/_search?pretty’ -d ‘{“from”:1,“size”:2 }’

1.4运行python rest_api/application.py 8891 报错

/bin/sh: pdftotext: command not found
Traceback (most recent call last):
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/pipelines/base.py", line 832, in _load_or_get_component
    component_type=component_type, **component_params)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/nodes/base.py", line 67, in load_from_args
    instance = subclass(**kwargs)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/nodes/file_converter/pdf.py", line 66, in __init__
    """)
Exception: pdftotext is not installed. It is part of xpdf or poppler-utils software suite.
                
                   Installation on Linux:
                   wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.03.tar.gz &&
                   tar -xvf xpdf-tools-linux-4.03.tar.gz && sudo cp xpdf-tools-linux-4.03/bin64/pdftotext /usr/local/bin
                   
                   Installation on MacOS:
                   brew install xpdf
                   
                   You can find more details here: https://www.xpdfreader.com
                

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "rest_api/application.py", line 33, in <module>
    from rest_api.controller.router import router as api_router
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines_rest_api-0.0.1a0-py3.7.egg/rest_api/controller/router.py", line 17, in <module>
    from rest_api.controller import file_upload, search, feedback, document
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines_rest_api-0.0.1a0-py3.7.egg/rest_api/controller/file_upload.py", line 59, in <module>
    Path(PIPELINE_YAML_PATH), pipeline_name=INDEXING_PIPELINE_NAME)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/pipelines/base.py", line 278, in load_from_yaml
    overwrite_with_env_variables=overwrite_with_env_variables,
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/pipelines/base.py", line 789, in load_from_config
    components=components)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/pipelines/base.py", line 835, in _load_or_get_component
    raise Exception(f"Failed loading pipeline component '{name}': {e}")
Exception: Failed loading pipeline component 'PDFFileConverter': pdftotext is not installed. It is part of xpdf or poppler-utils software suite.
                
                   Installation on Linux:
                   wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.03.tar.gz &&
                   tar -xvf xpdf-tools-linux-4.03.tar.gz && sudo cp xpdf-tools-linux-4.03/bin64/pdftotext /usr/local/bin
                   
                   Installation on MacOS:
                   brew install xpdf
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43

报错pdftotext安装地址:http://www.xpdfreader.com/download.html

wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.04.tar.gz && tar -xvf xpdf-tools-linux-4.04.tar.gz && sudo cp xpdf-tools-linux-4.04/bin64/pdftotext /usr/local/bin
  • 1

1.5报错解决方案,先安装依赖

pdftotext安装说明:https://github.com/jalan/pdftotext

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel
pip install pdftotext
  • 1
  • 2

1.51访问报错

es/preprocessor/preprocessor.py", line 265, in split
    language=self.language)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/nltk/data.py", line 750, in load
    opened_resource = _open(resource_url)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/nltk/data.py", line 876, in _open
    return find(path_, path + [""]).open()
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

1.52访问报错解决方案

(python37) [root@k8s-master02 pipelines]# python
Python 3.7.0 (default, Oct  9 2018, 10:31:47) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('punkt')
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
True

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

1.52上传文件乱码报错

在这里插入图片描述

1.52上传文件乱码报错解决方案

把要上传的文件另存为utf-8的格式再上传

1.7运行python -m streamlit run ui/webapp_question_answering.py --server.port 8502报错

Traceback (most recent call last):
  File "/data/anaconda3/envs/python37/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/data/anaconda3/envs/python37/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/streamlit-1.7.0-py3.7.egg/streamlit/__main__.py", line 21, in <module>
    main(prog_name="streamlit")
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/streamlit-1.7.0-py3.7.egg/streamlit/cli.py", line 204, in main_run
    _main_run(target, args, flag_options=kwargs)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/streamlit-1.7.0-py3.7.egg/streamlit/cli.py", line 232, in _main_run
    command_line = _get_command_line_as_string()
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/streamlit-1.7.0-py3.7.egg/streamlit/cli.py", line 221, in _get_command_line_as_string
    cmd_line_as_list.extend(click.get_os_args())
AttributeError: module 'click' has no attribute 'get_os_args'

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25

1.8报错解决方案

参考解决方案:https://blog.csdn.net/qq_37223079/article/details/124315174

pip install click == 8.0.0
  • 1

2、如何在docker内运行

2.1、requirements.txt安装包编写

paddlenlp==2.3.3
paddleocr==2.5.0.3
requests==2.28.0
pydantic==1.9.1
mmh3==3.0.0
more-itertools==8.13.0
elasticsearch==7.10.0
SQLAlchemy==1.4.37
SQLAlchemy-Utils==0.38.2
langdetect==1.0.9
python-docx==0.8.11
nltk==3.7
pdfplumber==0.7.1
importlib-metadata==4.2.0
faiss-gpu==1.7.2

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

2.2、docker指令运行

docker pull registry.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7-gcc54-dev

nvidia-docker run -it --entrypoint=/bin/bash registry.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7-gcc54-dev

pip3.7 install paddlepaddle-gpu==2.3.0.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

pip3.7 install -r requirements.txt

python3.7 setup.py install

python3.7 examples/question-answering/dense_qa_example.py --device gpu
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

2.3、dockerfile制作镜像运行,dockerfile内容

FROM registry.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7-gcc54-dev


COPY . /deploy
WORKDIR /deploy

RUN pip config set global.index-url https://mirror.baidu.com/pypi/simple \
    && python3.7 -m pip install --upgrade setuptools \
    && python3.7 -m pip install --upgrade pip \
    && python3.7 -m pip install paddlepaddle-gpu==2.3.0.post101 \
    -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html \
    # 1) 安装 pipelines package\
    && cd pipelines/ \
    && pip3.7 install -r requirements.txt \
    && python3.7 setup.py install

ENTRYPOINT export CUDA_VISIBLE_DEVICES=0 && \
           python3.7 pipelines/examples/question-answering/dense_qa_example.py --device gpu

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

2.3.1、dockerfile制作镜像

nvidia-docker build -t chatbot-qa:1.0.0.0630 .

nvidia-docker run --name chatbot-qa -d chatbot-qa:1.0.0.0630

nvidia-docker exec -it chatbot-qa /bin/bash

docker logs chatbot-qa
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/348228
推荐阅读
相关标签
  

闽ICP备14008679号