赞
踩
看自己缺什么包,补什么包
先确保自己模型加载,在当前环境下不报错。
- from fastapi import FastAPI
- from pydantic import BaseModel
- import uvicorn
- import torch
- import os
- from modelscope import AutoTokenizer, AutoModel, snapshot_download
- os.environ['CUDA_VISIBLE_DEVICES'] = "1,0"
-
- app = FastAPI()
-
- class Query(BaseModel):
- text: str
-
- model_dir = snapshot_download("ZhipuAI/chatglm3-6b-32k", revision = "v1.0.0")
- tokenizer = AutoTokenizer.from_pretrained("ZhipuAI/chatglm3-6b-32k", trust_remote_code=True)
- model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).half().cuda()
- model = torch.nn.DataParallel(model,device_ids=[0,1])
- if isinstance(model,torch.nn.DataParallel):
- model = model.module
-
- @app.post("/chat/")
- async def chat(query: Query):
- input_ids = tokenizer([query.text]).input_ids
- output_ids = model.generate(
- torch.as_tensor(input_ids).cuda(),
- do_sample=False,
- temperature=0.1,
- repetition_penalty=1,
- max_new_tokens=1024)
- output_ids = output_ids[0][len(input_ids[0]):]
- outputs = tokenizer.decode(output_ids, skip_special_tokens=True, spaces_between_special_tokens=False)
- return {"result": outputs}
-
-
- if __name__ == "__main__":
- uvicorn.run(app, host="0.0.0.0", port=8060)
运行上门的文件
- import requests
- url = "http://0.0.0.0:8060/chat"
- query={'text':"hi"}
- response = requests.post(url, json=query)
- if response.status_code == 200:
- res = response.json()
- print("Chatglm3:",res["result"])
- else:
- print("error")
运行,正常返回,表示api调用成功!
查看防火墙,确保端口 8060是允许的,或者换成自己的允许的端口
查询端口状态:ufw status
坑:
当时设置环境os.environ['CUDA_VISIBLE_DEVICES'] = "1,0" 报错
bash重置指定显卡解决问题:
unset CUDA_VISIBLE_DEVICES
并行报错:
AttributeError: ‘DataParallel’ object has no attribute ‘xxxx’
在并行后加入以下代码:
- if isinstance(model,torch.nn.DataParallel):
- model = model.module
参考文章:
【FastAPI】利用FastAPI构建大模型接口服务-CSDN博客
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization 报错-CSDN博客
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。