赞
踩
LocalAI 是一个用于本地推理的,与 OpenAI API 规范兼容的 REST API。
它允许您在本地使用消费级硬件运行 LLM(不仅如此),支持与 ggml 格式兼容的多个模型系列。支持CPU硬件/GPU硬件。
https://www.bilibili.com/video/BV141421o7Lh/
【LocalAI】(6):在autodl上使用4090部署LocalAIGPU版本,成功运行qwen-1.5-32b大模型,占用显存18G,速度 84t/s
部署方法项目地址:
https://gitee.com/fly-llm/localai-run-llm
# 文件比较大,可以先进行下载,然后在注册模型
wget "https://modelscope.cn/api/v1/models/qwen/Qwen1.5-32B-Chat-GGUF/repo?Revision=master&FilePath=qwen1_5-32b-chat-q4_0.gguf"
curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
"url": "https://gitee.com/fly-llm/localai-run-llm/raw/master/model-gallery/qwen1.5-32b.yaml",
"name": "qwen1.5-32b-chat"
}'
测试接口
curl -X 'POST' 'http://0.0.0.0:8080/v1/chat/completions' \
-H 'Content-Type: application/json' -d '{
"model": "qwen1.5-32b-chat","stream":true,
"messages": [
{
"role": "user",
"content": "北京景点"
}
]
}'
需要手动修改下配置文件:
# https://github.com/mudler/LocalAI/issues/1110
# Model name.
# The model name is used to identify the model in the API calls.
name: "qwen-1.5-32b"
description: |
qwen-1.5-32b
license: "Apache 2.0"
urls:
- https://github.com/QwenLM/Qwen1.5
- https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat-GGUF/summary
config_file: |
backend: llama
parameters:
model: qwen1_5-32b-chat-q4_0.gguf
top_k: 80
temperature: 1
top_p: 0.7
context_size: 1024
template:
completion: qwen-1.5-completion
chat: qwen-1.5-chat
chat-message: qwen-1.5-chat-message
files:
- filename: "qwen1_5-32b-chat-q4_0.gguf"
sha256: "0688760683b9ca390070d62d06bdba06593d200cf07456478e4baeb66655c64b"
uri: "https://www.modelscope.cn/api/v1/models/qwen/Qwen1.5-32B-Chat-GGUF/repo?Revision=master&FilePath=qwen1_5-32b-chat-q4_0.gguf"
prompt_templates:
- name: "qwen-1.5-completion"
content: |
{{.Input}}
- name: "qwen-1.5-chat"
content: |
{{.Input}}
<|im_start|>assistant
- name: "qwen-1.5-chat-message"
content: |
<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
{{if .Content}}{{.Content}}{{end}}
<|im_end|>
配置成功之后就可以启动了。
24G的显存占用了 18G,同时速度还可以。
https://www.modelscope.cn/models/qwen/Qwen1.5-32B-Chat-GGUF/summary
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。