赞
踩
使用modelscope下载
from modelscope.models import Model
model = Model.from_pretrained('qwen/Qwen1.5-14B-chat')
使用huggingface下载
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen1.5-14B-Chat')
from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen1.5-14B-Chat", torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-14B-Chat") prompt = "Give me a short introduction to large language model." messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(device) generated_ids = model.generate( model_inputs.input_ids, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
vLLM拉起openai服务
python -m vllm.entrypoints.openai.api_server \
--model qwen/Qwen1.5-14B-Chat --max-model-len 8192 --gpu-memory-utilization 0.95
访问服务
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/Qwen1.5-7B-Chat",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "写一篇春天为主题的作文"}
],
"stop": ["<|im_end|>", "<|endoftext|>"]
}'
参考资料:
- https://huggingface.co/Qwen/Qwen1.5-14B-Chat
- https://developer.aliyun.com/article/1439006
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。