赞
踩
1、下载Qwen-14B-Chat-int4 模型
git clone https://www.modelscope.cn/qwen/Qwen-14B-Chat-Int4.git
2、下载
git clone https://github.com/Dao-AILab/flash-attention
3、下载 QWEN 代码
git clone https://github.com/QwenLM/Qwen.git
将 finetune 文件夹web_demo.py finetune.py 一个文件夹 和两个PY文件 粘贴到 模型文件夹下
4、执行
- cd flash-attention
-
- # 下方安装可选,安装可能比较缓慢。
-
- pip install csrc/layer_norm
-
- # 如果flash-attn版本高于2.1.1,下方无需安装。
-
- pip install csrc/rotary
5、修改finetune 文件夹的 finetune_qlora_single_gpu.sh 文件里的MODEL、 DATA、 python 路径
- MODEL="/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4" # Set the path if you do not want to load from huggingface directly
- # ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
- # See the section for finetuning in README for more information.
- DATA="/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4/finetune/total36-67.json"
-
-
-
-
- python /mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4/finetune.py \
-
- --deepspeed ds_config_zero2.json
6、修改output_qwen文件夹adapter_config.json文件的base_model_name_or_path
- {
- "alpha_pattern": {},
- "auto_mapping": null,
- "base_model_name_or_path": "/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4",
- "bias": "none",
- "fan_in_fan_out": false,
- "inference_mode": true,
- "init_lora_weights": true,
- "layers_pattern": null,
- "layers_to_transform": null,
- "loftq_config": {},
- "lora_alpha": 16,
- "lora_dropout": 0.05,
- "megatron_config": null,
- "megatron_core": "megatron.core",
- "modules_to_save": null,
- "peft_type": "LORA",
- "r": 64,
- "rank_pattern": {},
- "revision": null,
- "target_modules": [
- "w2",
- "c_proj",
- "c_attn",
- "w1"
- ],
- "task_type": "CAUSAL_LM"
- }

7、将json模型文件放到此目录下
8、执行 # 单卡训练
bash finetune_qlora_single_gpu.sh
模型微调成功后
9、然后 修改config.json文件在quantization_config
下添加"disable_exllama":true
- "quantization_config": {
- "bits": 4,
- "group_size": 128,
- "damp_percent": 0.01,
- "desc_act": false,
- "static_groups": false,
- "sym": true,
- "true_sequential": true,
- "model_name_or_path": null,
- "model_file_base_name": "model",
- "quant_method": "gptq",
- "disable_exllama":true
- },
10、在web_demo.py 添加
- # model = AutoModelForCausalLM.from_pretrained(
- # args.checkpoint_path,
- # device_map=device_map,
- # trust_remote_code=True,
- # resume_download=True,
- # ).eval()
- from peft import AutoPeftModelForCausalLM
-
- model = AutoPeftModelForCausalLM.from_pretrained(
- # 训练后生成的文件夹路径
- '/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4/finetune/output_qwen',
- device_map="auto",
- trust_remote_code=True
- ).eval()
并把以前的 model注释掉
11、修改web_demo.py
DEFAULT_CKPT_PATH
AutoPeftModelForCausalLM.from_pretrained 路径
- DEFAULT_CKPT_PATH = '/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4'
-
- model = AutoPeftModelForCausalLM.from_pretrained(
- # 训练后生成的文件夹路径
- '/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4/finetune/output_qwen',
- device_map="auto",
- trust_remote_code=True
- ).eval()
12、 启动
python web_demo.py
注意:这里的路径是根据个人下载的路径为准
最后执行 python web_demo.py 如果报错哪个包没有安装就安装哪个包
但是如果安装出现 提问后页面回答报错 控制台输出的时候 有可能是遇到gradio版本不兼容
更新到对应的版本即可 我的版本是3.40.0 希望大家不要踩坑
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。