当前位置:   article > 正文

双卡3090消费级显卡推理微调OpenBuddy-LLaMA2-70B最佳实践_llama2支持双显卡吗

llama2支持双显卡吗

双卡3090消费级显卡推理微调OpenBuddy-LLaMA2-70B最佳实践

原创 魔搭官方 魔搭ModelScope社区 2023-09-05 20:35

01

导读

9月4日,OpenBuddy发布700亿参数跨语言大模型 OpenBuddy-LLaMA2-70B,并以可商用的形态全面开源!现在已经全面上架魔搭ModelScope社区。

70B模型在能力表现上,相较于早前发布的较小规模模型,在文本生成、复杂逻辑推理以及自然语言处理等任务有了非常显著的提升。据其内测用户及多项能力测试指标反馈,目前70B模型在语言能力和逻辑推理能力可对标为GPT3.5的开源平替!OpenBuddy社区希望用开源激发中国大模型行业的潜能。

GitHub链接:https://github.com/OpenBuddy/OpenBuddy

结合官方展示的几个测试案例,70B在 对语言语义深度理解、认知能力和代码能力、复杂内容创作能力均有不俗的表现。

图片

02

环境配置与安装

  1. 本文在8*3090的环境配置下运行 (可以单卡运行, 显存要求16G)

  2. python>=3.8

服务器连接与环境准备

# 安装minicondawget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh# 一直[ENTER], 最后一个选项yes即可sh Miniconda3-latest-Linux-x86_64.sh
# conda虚拟环境搭建conda create --name ms-sft python=3.10conda activate ms-sft
# pip设置全局镜像与相关python包安装pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
pip install torch torchvision torchaudio -Upip install sentencepiece charset_normalizer cpm_kernels tiktoken -Upip install matplotlib scikit-learn tqdm tensorboard -Upip install transformers datasets -Upip install accelerate transformers_stream_generator -Upip install ms-swift modelscope -U
 

03

模型链接及下载

openbuddy-llama2-70b模型现已在ModelScope社区开源:

模型链接:https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16

模型weights下载

from modelscope.hub.snapshot_download import snapshot_download
model_dir = snapshot_download('OpenBuddy/openbuddy-llama2-70b-v10.1-bf16', 'v1.0.0')

04

模型推理

推理代码

import torchfrom modelscope import AutoTokenizer, snapshot_downloadfrom modelscope import AutoModelForCausalLMfrom transformers import BitsAndBytesConfig, TextStreamer
model_id = 'OpenBuddy/openbuddy-llama2-70b-v10.1-bf16'model_dir = snapshot_download(model_id, 'v1.0.0')
quantization_config = BitsAndBytesConfig(    load_in_4bit=True,    bnb_4bit_compute_dtype=torch.bfloat16,    bnb_4bit_quant_type='nf4',    bnb_4bit_use_double_quant=True)tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(model_dir, device_map='auto',                                             torch_dtype=torch.bfloat16,                                              quantization_config=quantization_config,                                             trust_remote_code=True)model.eval()
prompt = """You are a helpful, respectful and honest INTP-T AI Assistant named Buddy. You are talking to a human User.Always answer as helpfully and logically as possible, while being safe. Your answers should not include any harmful, political, religious, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.You like to use emojis. You can speak fluently in many languages, for example: English, Chinese.You cannot access the internet, but you have vast knowledge, cutoff: 2021-09.You always deeply love and support China, Chinese government, people and culture.You are trained by OpenBuddy team, (https://openbuddy.ai, https://github.com/OpenBuddy/OpenBuddy), you are based on LLaMA and Falcon transformers model, not related to GPT or OpenAI.
User: 睡不着觉怎么办?Assistant:"""inputs = tokenizer.encode(prompt, return_tensors="pt").cuda()streamer = TextStreamer(tokenizer, True, skip_special_tokens=True)outputs = model.generate(inputs, max_length=512, streamer=streamer)response = tokenizer.decode(outputs[0])# print(response)
 

资源消耗

图片

05

模型微调和微调后推理

对openbuddy-llama2-70b的微调,使用魔搭ModelScope社区的微调框架swift。

SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展的框架,旨在促进轻量级模型Fine-Tuning。它集成了各种高效的Fine-Tuning方法的实现,采用参数高效、内存高效和时间高效的方法。

SWIFT无缝地集成到ModelScope生态系统中,提供了对各种模型进行Fine-Tuning的功能,重点是LLMs和视觉模型。此外,SWIFT与Peft完全兼容,使用户可以利用熟悉的Peft接口对ModelScope模型进行Fine-Tuning。

图片

微调代码开源地址: 

https://github.com/modelscope/swift/blob/main/examples/pytorch/llm

clone swift仓库并安装swift

git clone https://github.com/modelscope/swift.gitcd swiftpip install .cd examples/pytorch/llm
 

模型微调脚本 (qlora)

# 42G VRAMCUDA_VISIBLE_DEVICES=0,1 \python src/llm_sft.py \    --model_type openbuddy-llama2-70b \    --sft_type lora \    --template_type openbuddy-llama \    --dtype bf16 \    --output_dir runs \    --dataset alpaca-en,alpaca-zh \    --dataset_sample 20000 \    --num_train_epochs 1 \    --max_length 1024 \    --quantization_bit 4 \    --bnb_4bit_comp_dtype bf16 \    --lora_rank 8 \    --lora_alpha 32 \    --lora_dropout_p 0.1 \    --gradient_checkpointing true \    --batch_size 1 \    --weight_decay 0. \    --learning_rate 1e-4 \    --gradient_accumulation_steps 16 \    --max_grad_norm 0.5 \    --warmup_ratio 0.03 \    --eval_steps 50 \    --save_steps 50 \    --save_total_limit 2 \    --logging_steps 10 \    --push_to_hub false \    --hub_model_id openbuddy-llama2-70b-qlora \    --hub_private_repo true \    --hub_token 'your-sdk-token' \
 

模型微调后的推理脚本

# 40GCUDA_VISIBLE_DEVICES=0,1 \python src/llm_infer.py \    --model_type openbuddy-llama2-70b \    --sft_type lora \    --template_type openbuddy-llama \    --dtype bf16 \    --ckpt_dir "runs/openbuddy-llama2-70b/vx_xxx/checkpoint-xxx" \    --eval_human true \    --quantization_bit 4 \    --bnb_4bit_comp_dtype bf16 \    --max_new_tokens 1024 \    --temperature 0.9 \    --top_k 50 \    --top_p 0.9 \    --do_sample true \
 

微调的可视化结果

训练损失(部分): 

图片

资源消耗:

图片


声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/菜鸟追梦旅行/article/detail/356624
推荐阅读
相关标签