赞
踩
基于 Llama3-8B-Instruct 和 XTuner 团队预训练好的 Image Projector 微调自己的多模态图文理解模型 LLaVA。
课程文档:Llama3-Tutorial/docs/llava.md at main · SmartFlowAI/Llama3-Tutorial · GitHub
使用之前课程中已经配置好的环境、XTuner和Llama3-Tutorial
Llama3 权重:使用之前课程软链接过的Llama3-8B-Instruct
Visual Encoder 权重:Llava 所需要的 openai/clip-vit-large-patch14-336,权重,即 Visual Encoder 权重。(使用软链接)
Image Projector 权重
xtuner train ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py --work-dir ~/llama3_llava_pth --deepspeed deepspeed_zero2
需要先安装deepspeed,重试
30%的A100好像不太够用,加上offload重试,启动成功
大约用时4个小时左右
- xtuner convert pth_to_hf ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py \
- ~/model/llama3-llava-iter_2181.pth \
- ~/llama3_llava_pth/pretrain_iter_2181_hf
-
- xtuner convert pth_to_hf ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py \
- ~/llama3_llava_pth/iter_1200.pth \
- ~/llama3_llava_pth/iter_1200_hf
检验模型效果
问题1:Describe this image. 问题2:What is the equipment in the image?
- export MKL_SERVICE_FORCE_INTEL=1
- xtuner chat /root/model/Meta-Llama-3-8B-Instruct \
- --visual-encoder /root/model/clip-vit-large-patch14-336 \
- --llava /root/llama3_llava_pth/iter_1200_hf \
- --prompt-template llama3_chat \
- --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg
原始模型回答不出第二个问题,经过微调后可以回答出来
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。