当前位置:   article > 正文

本地容器化快速部署GLM-4-9B-chat教程_chatglm4-9b

chatglm4-9b

本文需要有一定的docker 基础知识,且会修改python代码,不是新手向的教程

第一步:下载官方项目

GitHub - THUDM/GLM-4: GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

 然后在官方项目的根目录下,添加dockerfile文件,docker file 文件的内容如下

  1. # 使用nvidia/cuda:12.4.1-devel-ubuntu22.04作为基础镜像
  2. FROM nvidia/cuda:12.4.1-devel-ubuntu22.04 AS dev
  3. # 更新包索引并安装Python 3的pip和Git
  4. RUN apt-get update -y \
  5. && apt-get install -y python3-pip git \
  6. && rm -rf /var/lib/apt/lists/*
  7. # 运行ldconfig命令以确保CUDA库可以被正确找到
  8. RUN ldconfig /usr/local/cuda-12.4/compat/
  9. # 设置工作目录
  10. WORKDIR /glm4
  11. # 复制文件夹和文件到容器的工作目录
  12. COPY basic_demo /glm4/basic_demo
  13. COPY composite_demo /glm4/composite_demo
  14. COPY finetune_demo /glm4/finetune_demo
  15. COPY LICENSE /glm4/LICENSE
  16. COPY resources /glm4/resources
  17. # 进入到/basic_demo目录
  18. WORKDIR /glm4/basic_demo
  19. # 安装requirements.txt中指定的Python依赖
  20. RUN python3 -m pip install --no-cache-dir -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/
  21. # 回到之前的工作目录
  22. WORKDIR /glm4
  23. # 声明容器将监听的端口
  24. EXPOSE 8000

然后在项目的根目录下,构建容器

docker build -t you_need_name/glm4-9b:tag .

然后等待容器构建成功。

容器构建成功后,需要提前下载好模型。

下载模型后,需要修改openai_api_server.py文件,修改的位置如下。

修改的内容如下,修改这些内容是为了方便在容器启动时,直接设置模型路径,

 

  1. MODEL_PATH = os.environ.get('LOCAL_MODEL_PATH','THUDM/glm-4-9b-chat')
  2. MAX_MODEL_LENGTH = os.environ.get('LOCAL_MAX_MODEL_LENGTH','8192')

修改完成后使用如下docker-compose.yml文件,在该文件的目录下运行 ‘docker compose up -d’命令

  1. version: '3.8'
  2. services:
  3. glm4_vllm_server:
  4. image: jjck/glm4-9b:20240606
  5. runtime: nvidia
  6. ipc: host
  7. restart: always
  8. environment:
  9. - LOCAL_MODEL_PATH=/glm4/glm-4-9b-chat
  10. - LOCAL_MAX_MODEL_LENGTH=8192
  11. ports:
  12. - "8101:8000"
  13. volumes:
  14. - "/etc/localtime:/etc/localtime:ro"
  15. - "/home/aicode/logs/AI_Order/glm-4-9b-chat:/glm4/glm-4-9b-chat"
  16. - "/home/pythonproject/GLM-4-main/basic_demo:/glm4/basic_demo"
  17. command: python3 basic_demo/openai_api_server.py
  18. deploy:
  19. resources:
  20. reservations:
  21. devices:
  22. - driver: nvidia
  23. device_ids: ['0']
  24. capabilities: [gpu]
  • "/home/aicode/logs/AI_Order/glm-4-9b-chat:/glm4/glm-4-9b-chat" - "/home/pythonproject/GLM-4-main/basic_demo:/glm4/basic_demo"

这两个路径,第一个是模型的存放路径,这个路径根据你的实际情况自行修改。

第二个是代码路径,这个也是根据自身情况自行修改。

或者是这个命令

  1. docker run --runtime nvidia --gpus device=0 \
  2. --name glm4_vllm_server \
  3. -p 8101:8000 \
  4. -v /etc/localtime:/etc/localtime:ro \
  5. -v /home/aicode/logs/AI_Order/glm-4-9b-chat:/glm4/glm-4-9b-chat \
  6. -v /home/pythonproject/GLM-4-main/basic_demo:/glm4/basic_demo \
  7. -itd xxx/glm4-9b:20240606

我尝试用他们的api_server demo的过程中出现的问题。

用清华官方的这个demo会出现异步线程的问题,导致用户访问异常,异常的日志如下

 

  1. glm4_vllm_api_server-glm4_vllm_server-1 | File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
  2. glm4_vllm_api_server-glm4_vllm_server-1 | return await dependant.call(**values)
  3. glm4_vllm_api_server-glm4_vllm_server-1 | File "/glm4/basic_demo/openai_api_server.py", line 344, in create_chat_completion
  4. glm4_vllm_api_server-glm4_vllm_server-1 | async for response in generate_stream_glm4(gen_params):
  5. glm4_vllm_api_server-glm4_vllm_server-1 | File "/glm4/basic_demo/openai_api_server.py", line 199, in generate_stream_glm4
  6. glm4_vllm_api_server-glm4_vllm_server-1 | async for output in engine.generate(inputs=inputs, sampling_params=sampling_params, request_id="glm-4-9b"):
  7. glm4_vllm_api_server-glm4_vllm_server-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 662, in generate
  8. glm4_vllm_api_server-glm4_vllm_server-1 | async for output in self._process_request(
  9. glm4_vllm_api_server-glm4_vllm_server-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 756, in _process_request
  10. glm4_vllm_api_server-glm4_vllm_server-1 | stream = await self.add_request(
  11. glm4_vllm_api_server-glm4_vllm_server-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 561, in add_request
  12. glm4_vllm_api_server-glm4_vllm_server-1 | self.start_background_loop()
  13. glm4_vllm_api_server-glm4_vllm_server-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 431, in start_background_loop
  14. glm4_vllm_api_server-glm4_vllm_server-1 | raise AsyncEngineDeadError(
  15. glm4_vllm_api_server-glm4_vllm_server-1 | vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.

所以,不要把demo放在生产环境中使用。需要在生产环境的,等官方适配吧

声明:本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号