当前位置:   article > 正文

Llama Factory 笔记_valueerror: cannot merge adapters to a quantized m

valueerror: cannot merge adapters to a quantized model.

本地环境:cuda 11.7    torch2.1.0

项目文件结构:

1. 项目文件结构:

如果利用Llama Factory 进行微调主要会用到  LLama-Factory/src 中的文件

2. src 下的目录结构

本地推理的demo    

通过api.py 进行 LLaMa-Factory 项目文件下运行,会有一个 web的demo 

(可能需要修改 gradio 下面一个包的权限,创建一个公共的端口就可以)

CUDA_VISIBLE_DEVICES=1  python src/api.py     --model_name_or_path  LLama/Llama3-8B-Chinese-Chat     --template llama3 

我运行之后打不开  网址 所以 根据之前的 为了简单起见 还是用 cli_demo.py 放在 src 路径下

  1. from llamafactory.chat import ChatModel
  2. from llamafactory.extras.misc import torch_gc
  3. try:
  4. import platform
  5. if platform.system() != "Windows":
  6. import readline # noqa: F401
  7. except ImportError:
  8. print("Install `readline` for a better experience.")
  9. def main():
  10. chat_model = ChatModel()
  11. messages = []
  12. print("Welcome to the CLI application, use `clear` to remove the history, use `exit` to exit the application.")
  13. while True:
  14. try:
  15. query = input("\nUser: ")
  16. except UnicodeDecodeError:
  17. print("Detected decoding error at the inputs, please set the terminal encoding to utf-8.")
  18. continue
  19. except Exception:
  20. raise
  21. if query.strip() == "exit":
  22. break
  23. if query.strip() == "clear":
  24. messages = []
  25. torch_gc()
  26. print("History has been removed.")
  27. continue
  28. messages.append({"role": "user", "content": query})
  29. print("Assistant: ", end="", flush=True)
  30. response = ""
  31. for new_text in chat_model.stream_chat(messages):
  32. print(new_text, end="", flush=True)
  33. response += new_text
  34. print()
  35. messages.append({"role": "assistant", "content": response})
  36. if __name__ == "__main__":
  37. main()
CUDA_VISIBLE_DEVICES=0  python src/cli_demo.py     --model_name_or_path  自己模型地址    --template 和模型有关(看github 的 readme)

遇到的问题:如果torch的版本低会有一个 BFloat16 的问题(开始是 2.0.1 报错了)

                        升级成 2.1.0 就好了

pytorch 官网 2.1.0 应该最低是cuda11.8 的 直接升级成这个就行 conda install 速度会快一些

可以在命令行进行展示:效果如下:

=============   以上是 2024.05.29 的 最新 LLaMa Factory 版本   =====================

本地微调:

再进行微调的时,主要就是 运行train.py 这个文件,但是需要指定一些参数 比如模型路径 数据集 微调方式等

train.py 内容

  1. from llamafactory.train.tuner import run_exp
  2. def main():
  3. run_exp()
  4. def _mp_fn(index):
  5. # For xla_spawn (TPUs)
  6. run_exp()
  7. if __name__ == "__main__":
  8. main()

可以看到 train.py  就是用到了 llamafactory.train.tuner ,所以进一步看一下 llamafactory 文件的目录结构

llamafactory/train 的 结构:

tuner.py 内容如下:python 相对导入:python 相对导入-CSDN博客

  1. from typing import TYPE_CHECKING, Any, Dict, List, Optional
  2. import torch
  3. from transformers import PreTrainedModel
  4. from ..data import get_template_and_fix_tokenizer
  5. from ..extras.callbacks import LogCallback
  6. from ..extras.logging import get_logger
  7. from ..hparams import get_infer_args, get_train_args
  8. from ..model import load_model, load_tokenizer
  9. from .dpo import run_dpo
  10. from .kto import run_kto
  11. from .ppo import run_ppo
  12. from .pt import run_pt
  13. from .rm import run_rm
  14. from .sft import run_sft
  15. if TYPE_CHECKING:
  16. from transformers import TrainerCallback
  17. logger = get_logger(__name__)
  18. def run_exp(args: Optional[Dict[str, Any]] = None, callbacks: List["TrainerCallback"] = []) -> None:
  19. model_args, data_args, training_args, finetuning_args, generating_args = get_train_args(args)
  20. callbacks.append(LogCallback(training_args.output_dir))
  21. if finetuning_args.stage == "pt":
  22. run_pt(model_args, data_args, training_args, finetuning_args, callbacks)
  23. elif finetuning_args.stage == "sft":
  24. run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  25. elif finetuning_args.stage == "rm":
  26. run_rm(model_args, data_args, training_args, finetuning_args, callbacks)
  27. elif finetuning_args.stage == "ppo":
  28. run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  29. elif finetuning_args.stage == "dpo":
  30. run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
  31. elif finetuning_args.stage == "kto":
  32. run_kto(model_args, data_args, training_args, finetuning_args, callbacks)
  33. else:
  34. raise ValueError("Unknown task.")
  35. def export_model(args: Optional[Dict[str, Any]] = None) -> None:
  36. model_args, data_args, finetuning_args, _ = get_infer_args(args)
  37. if model_args.export_dir is None:
  38. raise ValueError("Please specify `export_dir` to save model.")
  39. if model_args.adapter_name_or_path is not None and model_args.export_quantization_bit is not None:
  40. raise ValueError("Please merge adapters before quantizing the model.")
  41. tokenizer_module = load_tokenizer(model_args)
  42. tokenizer = tokenizer_module["tokenizer"]
  43. processor = tokenizer_module["processor"]
  44. get_template_and_fix_tokenizer(tokenizer, data_args.template)
  45. model = load_model(tokenizer, model_args, finetuning_args) # must after fixing tokenizer to resize vocab
  46. if getattr(model, "quantization_method", None) and model_args.adapter_name_or_path is not None:
  47. raise ValueError("Cannot merge adapters to a quantized model.")
  48. if not isinstance(model, PreTrainedModel):
  49. raise ValueError("The model is not a `PreTrainedModel`, export aborted.")
  50. if getattr(model, "quantization_method", None) is None: # cannot convert dtype of a quantized model
  51. output_dtype = getattr(model.config, "torch_dtype", torch.float16)
  52. setattr(model.config, "torch_dtype", output_dtype)
  53. model = model.to(output_dtype)
  54. else:
  55. setattr(model.config, "torch_dtype", torch.float16)
  56. model.save_pretrained(
  57. save_directory=model_args.export_dir,
  58. max_shard_size="{}GB".format(model_args.export_size),
  59. safe_serialization=(not model_args.export_legacy_format),
  60. )
  61. if model_args.export_hub_model_id is not None:
  62. model.push_to_hub(
  63. model_args.export_hub_model_id,
  64. token=model_args.hf_hub_token,
  65. max_shard_size="{}GB".format(model_args.export_size),
  66. safe_serialization=(not model_args.export_legacy_format),
  67. )
  68. try:
  69. tokenizer.padding_side = "left" # restore padding side
  70. tokenizer.init_kwargs["padding_side"] = "left"
  71. tokenizer.save_pretrained(model_args.export_dir)
  72. if model_args.export_hub_model_id is not None:
  73. tokenizer.push_to_hub(model_args.export_hub_model_id, token=model_args.hf_hub_token)
  74. if model_args.visual_inputs and processor is not None:
  75. getattr(processor, "image_processor").save_pretrained(model_args.export_dir)
  76. if model_args.export_hub_model_id is not None:
  77. getattr(processor, "image_processor").push_to_hub(
  78. model_args.export_hub_model_id, token=model_args.hf_hub_token
  79. )
  80. except Exception:
  81. logger.warning("Cannot save tokenizer, please copy the files manually.")

可以看到 包含两个函数:

  1. run_exp()   根据传入参数的不同选择不同的方式 

  2. export_model: 将原来的模型和微调之后的checkpoint 进行合并  

到这里就基本上完成了  流程上的梳理  具体的微调方法需要到每个函数内部自行查看

=======================  以上 2024/05/27 ========================

怎么finetuning起来?

写一个脚本 train.sh ,放在 llama-factory 根目录下:终端运行   bash train.sh 即可

  1. CUDA_VISIBLE_DEVICES=0 python src/train.py \
  2. --stage sft \
  3. --do_train True \
  4. --model_name_or_path 自己模型的路径\
  5. --finetuning_type lora \
  6. --template default \
  7. --flash_attn auto \
  8. --dataset_dir data \
  9. --dataset 自己的数据集\
  10. --cutoff_len 1024 \
  11. --learning_rate 5e-05 \
  12. --num_train_epochs 1.0 \
  13. --max_samples 100000 \
  14. --per_device_train_batch_size 2 \
  15. --gradient_accumulation_steps 8 \
  16. --lr_scheduler_type cosine \
  17. --max_grad_norm 1.0 \
  18. --logging_steps 5 \
  19. --save_steps 100 \
  20. --warmup_steps 0 \
  21. --optim adamw_torch \
  22. --report_to none \
  23. --output_dir 模型微调完成之后adapter的输出位置 \
  24. --fp16 True \
  25. --lora_rank 8 \
  26. --lora_alpha 16 \
  27. --lora_dropout 0 \
  28. --lora_target q_proj,v_proj \
  29. --plot_loss True

具体的参数  batch_size ,lora_rank 需自行确定

推理: 

  1. CUDA_VISIBLE_DEVICES=0 python src/cli_demo.py
  2. --model_name_or_path 模型地址
  3. --adapter_name_or_path 训练出来的适配器的位置
  4. --template 提示词模版和模型相关

即可成功 !

注:暂时没用 vllm 框架,用的话可能问题较多

本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号