赞
踩
书生·浦语大模型实战营之Llama 3 高效部署实践(LMDeploy 版)
InternStudio 可以直接使用
studio-conda -t Llama3_lmdeploy -o pytorch-2.1.2
软链接 InternStudio 中的模型
mkdir -p ~/model
ln -s /root/share/new_models/meta-llama/Meta-Llama-3-8B-Instruct ~/model/Meta-Llama-3-8B-Instruct
几个容易迷惑的点:
使用Transformer库运行模型
使用Transformer库之前需要确定安装的是最新版本
pip install transformers==4.40.0
运行touch /root/pipeline_transformer.py 将下面代码复制进去,然后保存
import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("/root/model/Meta-Llama-3-8B-Instruct", trust_remote_code=True) # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and cause OOM Error. model = AutoModelForCausalLM.from_pretrained("/root/model/Meta-Llama-3-8B-Instruct", torch_dtype=torch.float16, trust_remote_code=True).cuda() model = model.eval() messages = [ {"role": "system", "content": "你现在是一个友好的机器人,回答的时候只能使用中文"}, {"role": "user", "content": "你好"}, ] input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] outputs = model.generate( input_ids, max_new_tokens=256, eos_token_id=terminators, do_sample=True, temperature=0.6, top_p=0.9, ) response = outputs[0][input_ids.shape[-1]:] print(tokenizer.decode(response, skip_special_tokens=True))
执行代码:
python /root/pipeline_transformer.py
运行结果为:模型能够正常输出结果,那就表明下载的模型没有问题。
(Llama3_lmdeploy) root@intern-studio-061925:~# python /root/pipeline_transformer.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [00:41<00:00, 10.42s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
你好!很高兴见到你!如何可以帮助你?
(Llama3_lmdeploy) root@intern-studio-061925:~#
接下来,可以使用lmdeploy进行对话交互。
直接在终端运行
lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct
Llama3模型在回答问题时倾向于使用英文,特别是对于稍微复杂的问题。简单的中文问题它会用中文回答,但是一旦问题变得复杂一些,就会全部使用英文。
(Llama3_lmdeploy) root@intern-studio-061925:~#
(Llama3_lmdeploy) root@intern-studio-061925:~# lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct
2024-04-24 09:38:44,886 - lmdeploy - WARNING - Fallback to pytorch engine because turbomind engine is not installed correctly. If you insist to use turbomind engine, you may need to reinstall lmdeploy from pypi or build from source and try again.
2024-04-24 09:38:57,493 - lmdeploy - INFO - Checking environment for PyTorch Engine.
2024-04-24 09:39:00,136 - lmdeploy - INFO - Checking model.
2024-04-24 09:39:00,137 - lmdeploy - WARNING - LMDeploy requires transformers version: [4.33.0 ~ 4.38.2], but found version: 4.40.0
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [00:37<00:00, 9.42s/it]
2024-04-24 09:39:48,210 - lmdeploy - INFO - build CacheEngine with config:CacheConfig(block_size=64, num_cpu_blocks=512, num_gpu_blocks=512, window_size=-1, cache_max_entry_count=0.8, max_prefill_token_num=4096)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
match template: <llama3>
double enter to end input >>> 你好
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
你好<|eot_id|><|start_header_id|>assistant<|end_header_id|>
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/620565
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。