赞
踩
QAnything (Question and Answer based on Anything) 是致力于支持任意格式文件或数据库的本地知识库问答系统,可断网安装使用。
您的任何格式的本地文件都可以往里扔,即可获得准确、快速、靠谱的问答体验。
目前已支持格式: PDF(pdf),Word(docx),PPT(pptx),XLS(xlsx),Markdown(md),电子邮件(eml),TXT(txt),图片(jpg,jpeg,png),CSV(csv),网页链接(html),更多格式,敬请期待...
知识库数据量大的场景下两阶段优势非常明显,如果只用一阶段embedding检索,随着数据量增大会出现检索退化的问题,如下图中绿线所示,二阶段rerank重排后能实现准确率稳定增长,即数据越多,效果越好。
QAnything使用的检索组件BCEmbedding有非常强悍的双语和跨语种能力,能消除语义检索里面的中英语言之间的差异,从而实现:
模型名称 | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | 平均 |
---|---|---|---|---|---|---|---|
bge-base-en-v1.5 | 37.14 | 55.06 | 75.45 | 59.73 | 43.05 | 37.74 | 47.20 |
bge-base-zh-v1.5 | 47.60 | 63.72 | 77.40 | 63.38 | 54.85 | 32.56 | 53.60 |
bge-large-en-v1.5 | 37.15 | 54.09 | 75.00 | 59.24 | 42.68 | 37.32 | 46.82 |
bge-large-zh-v1.5 | 47.54 | 64.73 | 79.14 | 64.19 | 55.88 | 33.26 | 54.21 |
jina-embeddings-v2-base-en | 31.58 | 54.28 | 74.84 | 58.42 | 41.16 | 34.67 | 44.29 |
m3e-base | 46.29 | 63.93 | 71.84 | 64.08 | 52.38 | 37.84 | 53.54 |
m3e-large | 34.85 | 59.74 | 67.69 | 60.07 | 48.99 | 31.62 | 46.78 |
bce-embedding-base_v1 | 57.60 | 65.73 | 74.96 | 69.00 | 57.29 | 38.95 | 59.43 |
模型名称 | Reranking | 平均 |
---|---|---|
bge-reranker-base | 57.78 | 57.78 |
bge-reranker-large | 59.69 | 59.69 |
bce-reranker-base_v1 | 60.06 | 60.06 |
System | Required item | Minimum Requirement | Note |
---|---|---|---|
Linux amd64 | NVIDIA GPU Memory | >= 4GB (use OpenAI API) | Minimum: GTX 1050Ti(use OpenAI API) Recommended: RTX 3090 |
NVIDIA Driver Version | >= 525.105.17 | ||
Docker version | >= 20.10.5 | Docker install | |
docker compose version | >= 2.23.3 | docker compose install |
System | Required item | Minimum Requirement | Note |
---|---|---|---|
Windows with WSL Ubuntu子系统 | NVIDIA GPU Memory | >= 4GB (use OpenAI API) | 最低: GTX 1050Ti(use OpenAI API) 推荐: RTX 3090 |
GEFORCE EXPERIENCE | >= 546.33 | GEFORCE EXPERIENCE download | |
Docker Desktop | >= 4.26.1(131620) | Docker Desktop for Windows | |
git-lfs | git-lfs install |
下载源代码:
git clone https://github.com/netease-youdao/QAnything.git
获取 Embedding 模型:
git clone https://www.modelscope.cn/netease-youdao/QAnything.git
下载大语言模型:
MiniChat-2-3B
git clone https://www.modelscope.cn/netease-youdao/MiniChat-2-3B.git
Qwen-7B
git clone https://www.modelscope.cn/netease-youdao/Qwen-7B-QAnything.git
用法:
- bash run.sh [-c <llm_api>] [-i <device_id>] [-b <runtime_backend>] [-m <model_name>] [-t <conv_template>] [-p <tensor_parallel>] [-r <gpu_memory_utilization>] [-h]
-
- -c <llm_api>:指定LLM API模式,选项为 {local, cloud},默认为 'local'。若设置为 '-c cloud',请首先手动设置环境变量 {OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_MODEL_NAME, OPENAI_API_CONTEXT_LENGTH} 到 .env 文件中。
- -i <device_id>:指定GPU设备ID。
- -b <runtime_backend>:指定LLM推理运行时后端,选项为 {default, hf, vllm}。
- -m <model_name>:指定要加载的公共LLM模型名称,用于通过FastChat serve API使用,选项为 {Qwen-7B-Chat, deepseek-llm-7b-chat, ...}。
- -t <conv_template>:指定使用公共LLM模型时的会话模板,选项为 {qwen-7b-chat, deepseek-chat, ...}。
- -p <tensor_parallel>:使用选项 {1, 2} 设置vllm后端的张量并行参数,默认为1。
- -r <gpu_memory_utilization>:指定vllm后端的gpu_memory_utilization参数(0,1],默认为0.81。
- -h:显示帮助信息。
服务启动命令 | GPUs | LLM Runtime Backend | LLM model |
---|---|---|---|
bash ./run.sh -c cloud -i 0 -b default | 1 | OpenAI API | OpenAI API |
bash ./run.sh -c local -i 0 -b default | 1 | FasterTransformer | Qwen-7B-QAnything |
bash ./run.sh -c local -i 0 -b hf -m MiniChat-2-3B -t minichat | 1 | Huggingface Transformers | Public LLM (e.g., MiniChat-2-3B) |
bash ./run.sh -c local -i 0 -b vllm -m MiniChat-2-3B -t minichat -p 1 -r 0.81 | 1 | vllm | Public LLM (e.g., MiniChat-2-3B) |
bash ./run.sh -c local -i 0,1 -b default | 2 | FasterTransformer | Qwen-7B-QAnything |
bash ./run.sh -c local -i 0,1 -b hf -m MiniChat-2-3B -t minichat | 2 | Huggingface Transformers | Public LLM (e.g., MiniChat-2-3B) |
bash ./run.sh -c local -i 0,1 -b vllm -m MiniChat-2-3B -t minichat -p 1 -r 0.81 | 2 | vllm | Public LLM (e.g., MiniChat-2-3B) |
bash ./run.sh -c local -i 0,1 -b vllm -m MiniChat-2-3B -t minichat -p 2 -r 0.81 | 2 | vllm | Public LLM (e.g., MiniChat-2-3B) |
- Note: 你可以根据自己的设备条件选择最合适的服务启动命令。
- (1) 当设置 "-i 0,1" 时,本地嵌入/重新排序将在设备 gpu_id_1 上运行,否则默认使用 gpu_id_0
- (2) 当设置 "-c cloud" 时,将使用本地Embedding/Rerank和OpenAI LLM API,仅需要约4GB VRAM(适用于GPU设备VRAM <= 8GB)。
- (3) 使用OpenAI LLM API时,您将被要求立即输入{OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_MODEL_NAME, OPENAI_API_CONTEXT_LENGTH}。
- (4) "-b hf" 是运行公共LLM推理的最推荐方式,但性能较差。
- (5) 在选择QAnything系统的公共Chat LLM时,应考虑更合适的PROMPT_TEMPLATE设置,以考虑不同的LLM模型。
- (6) 支持使用Huggingface Transformers/vllm后端的FastChat API的公共LLM列表位于 "/path/to/QAnything/third_party/FastChat/fastchat/conversation.py" 中。
支持使用 FastChat API 和 Huggingface Transformers/vllm 运行时后端的 Pulic LLM
model_name | conv_template | Supported Pulic LLM List |
---|---|---|
Qwen-7B-QAnything | qwen-7b-qanything | Qwen-7B-QAnything |
Qwen-1_8B-Chat/Qwen-7B-Chat/Qwen-14B-Chat | qwen-7b-chat | Qwen |
Baichuan2-7B-Chat/Baichuan2-13B-Chat | baichuan2-chat | Baichuan2 |
MiniChat-2-3B | minichat | MiniChat |
deepseek-llm-7b-chat | deepseek-chat | Deepseek |
Yi-6B-Chat | Yi-34b-chat | Yi |
chatglm3-6b | chatglm3 | ChatGLM3 |
... check or add conv_template for more LLMs in "/path/to/QAnything/third_party/FastChat/fastchat/conversation.py" |
推荐用于具有 VRAM <= 16GB 的 GPU 设备
- 1.1 运行Qwen-7B-QAnything
- ## Step 1. Download the public LLM model (e.g., Qwen-7B-QAnything) and save to "/path/to/QAnything/assets/custom_models"
- ## (Optional) Download Qwen-7B-QAnything from ModelScope: https://www.modelscope.cn/models/netease-youdao/Qwen-7B-QAnything
- ## (Optional) Download Qwen-7B-QAnything from Huggingface: https://huggingface.co/netease-youdao/Qwen-7B-QAnything
- cd /path/to/QAnything/assets/custom_models
- git clone https://huggingface.co/netease-youdao/Qwen-7B-QAnything
-
- # Step 2. Execute the service startup command. Here we use "-b hf" to specify the Huggingface transformers backend.
- ## Here we use "-b hf" to specify the transformers backend that will load model in 8 bits but do bf16 inference as default for saving VRAM.
- cd /path/to/QAnything
- bash ./run.sh -c local -i 0 -b hf -m Qwen-7B-QAnything -t qwen-7b-qanything
-
-
-
- 1.2 运行一个公共的 LLM 模型(例如 MiniChat-2-3B)
- ## Step 1. Download the public LLM model (e.g., MiniChat-2-3B) and save to "/path/to/QAnything/assets/custom_models"
- cd /path/to/QAnything/assets/custom_models
- git clone https://huggingface.co/GeneZC/MiniChat-2-3B
-
- ## Step 2. Execute the service startup command. Here we use "-b hf" to specify the Huggingface transformers backend.
- ## Here we use "-b hf" to specify the transformers backend that will load model in 8 bits but do bf16 inference as default for saving VRAM.
- cd /path/to/QAnything
- bash ./run.sh -c local -i 0 -b hf -m MiniChat-2-3B -t minichat
- 2.1 运行 Qwen-7B-QAnything
- ## Step 1. Download the public LLM model (e.g., Qwen-7B-QAnything) and save to "/path/to/QAnything/assets/custom_models"
- ## (Optional) Download Qwen-7B-QAnything from ModelScope: https://www.modelscope.cn/models/netease-youdao/Qwen-7B-QAnything
- ## (Optional) Download Qwen-7B-QAnything from Huggingface: https://huggingface.co/netease-youdao/Qwen-7B-QAnything
- cd /path/to/QAnything/assets/custom_models
- git clone https://huggingface.co/netease-youdao/Qwen-7B-QAnything
-
- ## Step 2. Execute the service startup command. Here we use "-b vllm" to specify the vllm backend.
- ## Here we use "-b vllm" to specify the vllm backend that will do bf16 inference as default.
- ## Note you should adjust the gpu_memory_utilization yourself according to the model size to avoid out of memory (e.g., gpu_memory_utilization=0.81 is set default for 7B. Here, gpu_memory_utilization is set to 0.85 by "-r 0.85").
- cd /path/to/QAnything
- bash ./run.sh -c local -i 0 -b vllm -m Qwen-7B-QAnything -t qwen-7b-qanything -p 1 -r 0.85
-
-
- 2.2 运行一个公共的 LLM 模型(例如 MiniChat-2-3B)
- ## Step 1. Download the public LLM model (e.g., MiniChat-2-3B) and save to "/path/to/QAnything/assets/custom_models"
- cd /path/to/QAnything/assets/custom_models
- git clone https://huggingface.co/GeneZC/MiniChat-2-3B
-
- ## Step 2. Execute the service startup command.
- ## Here we use "-b vllm" to specify the vllm backend that will do bf16 inference as default.
- ## Note you should adjust the gpu_memory_utilization yourself according to the model size to avoid out of memory (e.g., gpu_memory_utilization=0.81 is set default for 7B. Here, gpu_memory_utilization is set to 0.5 by "-r 0.5").
- cd /path/to/QAnything
- bash ./run.sh -c local -i 0 -b vllm -m MiniChat-2-3B -t minichat -p 1 -r 0.5
-
- ## (Optional) Step 2. Execute the service startup command to specify the vllm backend by "-i 0,1 -p 2". It will do faster inference by setting a tensor parallel mode on 2 GPUs.
- ## bash ./run.sh -c local -i 0,1 -b vllm -m MiniChat-2-3B -t minichat -p 2 -r 0.5
- cd /path/to/QAnything
- bash ./run.sh -c local -i 0,1 -b vllm -m Qwen-7B-QAnything -t qwen-7b-qanything -p 2 -r 0.85
运行成功后,即可在浏览器输入以下地址进行体验。
your_host
:5052/qanything/如果想要访问API接口,请参考下面的地址:
your_host
:8777/api/- DEBUG
- 如果想要查看相关日志,请查看QAnything/logs/debug_logs目录下的日志文件。
-
- debug.log
- 用户请求处理日志
- sanic_api.log
- 后端服务运行日志
- llm_embed_rerank_tritonserver.log(单卡部署)
- LLM embedding和rerank tritonserver服务启动日志
- llm_tritonserver.log(多卡部署)
- LLM tritonserver服务启动日志
- embed_rerank_tritonserver.log(多卡部署或使用openai接口)
- embedding和rerank tritonserver服务启动日志
- rerank_server.log
- rerank服务运行日志
- ocr_server.log
- OCR服务运行日志
- npm_server.log
- 前端服务运行日志
- llm_server_entrypoint.log
- LLM中转服务运行日志
- fastchat_logs/*.log
- FastChat服务运行日志
bash close.sh
- # 先在联网机器上下载docker镜像
- docker pull quay.io/coreos/etcd:v3.5.5
- docker pull minio/minio:RELEASE.2023-03-20T20-16-18Z
- docker pull milvusdb/milvus:v2.3.4
- docker pull mysql:latest
- docker pull freeren/qanything:v1.2.1
-
- # 打包镜像
- docker save quay.io/coreos/etcd:v3.5.5 minio/minio:RELEASE.2023-03-20T20-16-18Z milvusdb/milvus:v2.3.4 mysql:latest freeren/qanything:v1.2.1 -o qanything_offline.tar
-
- # 下载QAnything代码
- wget https://github.com/netease-youdao/QAnything/archive/refs/heads/master.zip
-
- # 把镜像qanything_offline.tar和代码QAnything-master.zip拷贝到断网机器上
- cp QAnything-master.zip qanything_offline.tar /path/to/your/offline/machine
-
- # 在断网机器上加载镜像
- docker load -i qanything_offline.tar
-
- # 解压代码,运行
- unzip QAnything-master.zip
- cd QAnything-master
- bash run.sh
QAnything github: https://github.com/netease-youdao/QAnything
QAnything gitee: QAnything: QAnything (Question and Answer based on Anything) 是致力于支持任意格式文件或数据库的本地知识库问答系统,可断网安装使用。GitHub - QwenLM/Qwen: The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.QAnything
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。