【LocalAI】（6）：在autodl上使用4090部署LocalAIGPU版本，成功运行qwen-1.5-32b大模型，占用显存18G，速度 84 tokens / s_qwen32b硬件

作者：盐析白兔 | 2024-07-08 12:43:31

踩

qwen32b硬件

1，关于LocalAI

LocalAI 是一个用于本地推理的，与 OpenAI API 规范兼容的 REST API。
它允许您在本地使用消费级硬件运行 LLM（不仅如此），支持与 ggml 格式兼容的多个模型系列。支持CPU硬件/GPU硬件。

https://www.bilibili.com/video/BV141421o7Lh/

【LocalAI】（6）：在autodl上使用4090部署LocalAIGPU版本，成功运行qwen-1.5-32b大模型，占用显存18G，速度 84t/s

2，关于qwen大模型1.5-32b

部署方法项目地址：

https://gitee.com/fly-llm/localai-run-llm

# 文件比较大，可以先进行下载，然后在注册模型
wget "https://modelscope.cn/api/v1/models/qwen/Qwen1.5-32B-Chat-GGUF/repo?Revision=master&FilePath=qwen1_5-32b-chat-q4_0.gguf"


curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
   "url": "https://gitee.com/fly-llm/localai-run-llm/raw/master/model-gallery/qwen1.5-32b.yaml",
   "name": "qwen1.5-32b-chat"
 }'
1
2
3
4
5
6
7
8

测试接口

curl -X 'POST' 'http://0.0.0.0:8080/v1/chat/completions' \
-H 'Content-Type: application/json' -d '{
    "model": "qwen1.5-32b-chat","stream":true,
    "messages": [
        {
            "role": "user",
            "content": "北京景点"
        }
    ]
}'
1
2
3
4
5
6
7
8
9
10

3，配置文件

需要手动修改下配置文件：

# https://github.com/mudler/LocalAI/issues/1110
# Model name.
# The model name is used to identify the model in the API calls.

name: "qwen-1.5-32b"

description: |
  qwen-1.5-32b

license: "Apache 2.0"
urls:
- https://github.com/QwenLM/Qwen1.5
- https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat-GGUF/summary

config_file: |
    backend: llama
    parameters:
      model: qwen1_5-32b-chat-q4_0.gguf
      top_k: 80
      temperature: 1
      top_p: 0.7
    context_size: 1024
    template:
      completion: qwen-1.5-completion
      chat: qwen-1.5-chat
      chat-message: qwen-1.5-chat-message
files:
    - filename: "qwen1_5-32b-chat-q4_0.gguf"
      sha256: "0688760683b9ca390070d62d06bdba06593d200cf07456478e4baeb66655c64b"
      uri: "https://www.modelscope.cn/api/v1/models/qwen/Qwen1.5-32B-Chat-GGUF/repo?Revision=master&FilePath=qwen1_5-32b-chat-q4_0.gguf"

prompt_templates:
- name: "qwen-1.5-completion"
  content: |
      {{.Input}}
- name: "qwen-1.5-chat"
  content: |
        {{.Input}}
        <|im_start|>assistant
- name: "qwen-1.5-chat-message"
  content: |
    <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
    {{if .Content}}{{.Content}}{{end}}
    <|im_end|>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

配置成功之后就可以启动了。
24G的显存占用了 18G,同时速度还可以。

4，模型地址

https://www.modelscope.cn/models/qwen/Qwen1.5-32B-Chat-GGUF/summary
在这里插入图片描述

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/盐析白兔/article/detail/798784