【LLM】-10-部署llama-3-chinese-8b-instruct-v3 大模型

作者：从前慢现在也慢 | 2024-08-05 00:54:56

踩

1、模型下载

4.2、聊天（chat completion）

4.3、多轮对话

4.4、文本嵌入向量

5、Java代码实现调用

由于在【LLM】-09-搭建问答系统-对输入Prompt检查-CSDN博客关于提示词注入问题上，

使用Langchain 配合 chatglm3-6b 无法从根本上防止注入攻击问题。

并且在Langchian中无法部署llama3模型（切换模型错误，原因暂未解决）

所以直接部署llama3中文大模型。

选择 llama-3-chinese-8b-instruct-v3 模型，需要16G显存。

部署使用参考文档 https://github.com/ymcui/Chinese-LLaMA-Alpaca

如何需要更大、更精确的模型参考魔搭社区

或者使用推荐/其他模型下载

1、模型下载

基于魔搭社区下载

git需要2.40 以上版本，git在低版本下载限制单个文件4G大小，但实际模式存在大于4G情况

git lfs install

git clone https://www.modelscope.cn/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3.git

2、下载项目代码

git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca-3.git

建议使用conda 环境


# 创建chatchat 环境
conda create -n llama3 python=3.11.8
 
# 激活环境
conda activate llama3

安装依赖


cd Chinese-LLaMA-Alpaca-3
pip install -r requirements.txt

3、启动模型

启动命令


python scripts/oai_api_demo/openai_api_server.py \
--base_model /path/to/base_model \
--lora_model /path/to/lora_model \
--gpus 0,1 \
--use_flash_attention_2

参数说明：

--base_model {base_model}：存放HF格式的Llama-3-Chinese-Instruct模型权重和配置文件的目录，可以是合并后的模型（此时无需提供--lora_model），也可以是转后HF格式后的原版Llama-3-Instruct模型（需要提供--lora_model）
--lora_model {lora_model}：Llama-3-Chinese-Instruct的LoRA解压后文件所在目录，也可使用
声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/930194