当前位置:   article > 正文

Qwen1.5原理_qwen1.5 prompt format

qwen1.5 prompt format

        使用 Qwen1.5-7B-Chat 模型进行微调,主要了解该模型如何构造 prompt,尤其对单轮对话和多轮对话的处理方式,只有了解并掌握其原理,才能根据需要做出相应的调整。下面将介绍微调时如何构造 prompt。

环境配置

        Qwen1.5-7B 微调和推理时环境配置如下:

  1. # python==3.10.13
  2. torch==2.0.0
  3. transformers==4.37.0
  4. deepspeed==0.12.6
  5. peft==0.7.1
  6. accelerate==0.28.0
  7. loguru==0.7.2
  8. wandb==0.16.5
  9. scikit-learn==1.4.1.post1

微调阶段

        下面使用官方开源的代码进行 debug,然后介绍该大模型的原理,为后续修改优化等工作奠定基础

官网代码原理

数据格式

        挑选的这些数据主要用于测试模型是如何构造 prompt, 只有了解其原理,后期才能根据需要做相应的修改,以符合我们的要求。

        注意:字段里面的内容可以随意替换,或用公开数据集,然后处理成这种格式也行。

  1. messages: List[List[Dict]] = [
  2. [
  3. {"role": "user", "content": "选走红绿灯最少"},
  4. {"role": "assistant", "content": "<导航>"}
  5. ],
  6. [
  7. {
  8. "role": "system",
  9. "content": "你是一个擅长猜测人类意图的人工智能助手,下面有选项供你选择用户的意图,请选择:\nA. <通用>\nB. <媒体>\nC. <系统控制>\nD. <天气>\nE. <车控>\nF. <导航>\nG. <蓝牙电话>"
  10. },
  11. {"role": "user", "content": "选走红绿灯最少"},
  12. {"role": "assistant", "content": "<导航>"}
  13. ],
  14. [
  15. {
  16. "role": "system",
  17. "content": "你是一个擅长猜测人类意图的人工智能助手,下面有选项供你选择用户的意图,请选择:\nA. <通用>\nB. <媒体>\nC. <系统控制>\nD. <天气>\nE. <车控>\nF. <导航>\nG. <蓝牙电话>"
  18. },
  19. {"role": "user", "content": "查询下明天天气温度"},
  20. {"role": "assistant", "content": "<天气>"},
  21. {"role": "user", "content": "你协助我开电台"},
  22. {"role": "assistant", "content": "<媒体>"},
  23. {"role": "user", "content": "开启最大除霜"},
  24. {"role": "assistant", "content": "<车控>"},
  25. {"role": "user", "content": "右车窗开开一下"},
  26. {"role": "assistant", "content": "<车控>"}
  27. ],
  28. ]

微调代码

        下面代码主要讲解如何对句子进行tokenizer操作。

  1. # 导包
  2. from dataclasses import dataclass, field
  3. import json
  4. import logging
  5. import copy
  6. import os
  7. import pathlib
  8. from typing import Dict, Optional, List
  9. import torch
  10. from torch.utils.data import Dataset
  11. import transformers
  12. from transformers import AutoModelForCausalLM, AutoTokenizer
  13. from transformers import Trainer, BitsAndBytesConfig, deepspeed
  14. from transformers.trainer_pt_utils import LabelSmoother
  15. # 加载 tokenizer
  16. tokenizer = AutoTokenizer.from_pretrained(
  17. r"D:\Company\Code\LLM\Qwen1.5-7B-Chat",
  18. cache_dir=None,
  19. model_max_length=512,
  20. padding_side="right",
  21. use_fast=False
  22. )
  23. # 将文本处理成 ids
  24. def preprocess(
  25. messages: List[List[Dict[str,str]]],
  26. tokenizer: transformers.PreTrainedTokenizer,
  27. max_len: int,
  28. ) -> Dict:
  29. texts = []
  30. for i, msg in enumerate(messages):
  31. texts.append(
  32. tokenizer.apply_chat_template(
  33. msg,
  34. tokenize=True,
  35. add_generation_prompt=False,
  36. padding=True,
  37. max_length=max_len,
  38. truncation=True,
  39. )
  40. )
  41. # tokenizer 后的 ids
  42. target_ids = copy.deepcopy(texts)
  43. target_ids[target_ids == tokenizer.pad_token_id] = -100
  44. print(texts)
  45. print(target_ids)
  46. # 如何被截断的
  47. after_target_ids = [tokenizer.decode(i) for i in texts]
  48. print(after_target_ids)
  49. # 测试
  50. if __name__ == "__main__":
  51. preprocess(messages=messages, tokenizer=tokenizer, max_len=52)

微调结论

  1. 如果对话时没有提供 system_prompt, 系统自动添加 system_prompt, 如上述数据中的例子1和例子2;
  2. 如果对话句子长度大于设置的阈值, 将会进行截断,这种情况可能会导致微调时文本信息不完整,如上述数据中的例子3;
  3. tokenizer 之后 prompt 的构造总结如下:
    1. 单轮:<|im_start|>system\nsystem_prompt<|im_end|>\n<|im_start|>user\nquery<|im_end|>\n<|im_start|>assistant\nanswer<|endoftext|><|endoftext|>.....
    2. 多轮:<|im_start|>system\nsystem_prompt<|im_end|>\n<|im_start|>user\nquery<|im_end|>\n<|im_start|>assistant\nanswer<|im_end|>\n<|im_start|>user\nquery<|im_end|>\n<|im_start|>assistant\nanswer<|im_end|>\n<|im_start|>user\nquery<|im_end|>\n<|im_start|>assistant\nanswer<|im_end|>\n<|im_start|>user\nquery<|im_end|>\n<|im_start|>assistant\nanswer<|endoftext|><|endoftext|>.....
          注:青色1表示system,绿色2表示query,玫红1表示answer。
  4. 对于label, 系统只对 pad_token_id(<|endoftext|>)设置为 -100,其余不做任何处理

自己微调

        根据官方给出的代码可以看出,该代码存在以下问题:

  1. 模型在微调阶段,如果input_ids = system + user_tokens + value_ids的长度超过model_max_length的长度时,将会对其进行截断,导致文本语义不完整。如果模型在训练时输入的文本有问题的话,那么训练出来的效果应该也不会很好;
  1. # query
  2. [151644, 8948, 198, 56568, 101909, 107618, 109736, 103971, 111450, 100623, 48692, 100168, 110498, 3837, 100431, 18830, 109487, 83744, 56568, 50404, 107494, 111450, 37945, 50404, 28311, 32, 13, 366, 105600, 397]
  3. # answer
  4. "<|im_start|>system\n你是一个擅长猜测人类意图的人工智能助手,下面有选项供你选择用户的意图,请选择:\nA. <通用>"

        根据上面提出的问题并结合自己对大模型的理解,下面将给出解决方案:

  1. 对于文本长度过长的问题,这里需要考虑单轮对话和多轮对话:
    1. 如果是单轮对话,简单粗暴的办法就是直接丢弃该句子;
    2. 如果是多轮的话,则只取后几轮对话,即 system + [query,answer,query,answer],这样就能保证对话时的语义是完整的。
  2. 理解 prompt 的构造之后,接下来就开始根据原理进行编码,将文本转化成 ids 并送进模型进行微调等工作。

常用功能封装

        在编程时,数据在读取和存储这两个阶段的代码可以适用于其它大模型,因此可以将这部分的代码进行分装,后续有需要时直接复制过来使用即可。

  1. from typing import List, Dict, Union
  2. import json
  3. import os
  4. def loads(path: str) -> List[Dict]:
  5. """
  6. Args:
  7. path (str): _description_
  8. Returns:
  9. List[Dict]: _description_
  10. """
  11. datas: List[Dict] = []
  12. with open(path, mode="r", encoding="UTF-8") as fr:
  13. for line in fr:
  14. datas.append(json.loads(line))
  15. return datas
  16. def load(path: str) -> List[Dict]:
  17. """
  18. Args:
  19. path (str): _description_
  20. Returns:
  21. List[Dict]: _description_
  22. """
  23. with open(path, mode="r", encoding="UTF-8") as fr:
  24. data = json.load(fr)
  25. return data
  26. def read_datas(paths: Union[str, List[str]] = None) -> List[List[Dict]]:
  27. """
  28. Args:
  29. paths (Union[str, List[str]], optional): _description_. Defaults to None.
  30. Returns:
  31. List[Dict]: _description_
  32. """
  33. if not paths:
  34. return []
  35. if isinstance(paths, str):
  36. paths = [paths]
  37. datas: List[List[Dict]] = []
  38. for path in paths:
  39. fold, suffix = os.path.splitext(path)
  40. if ".json" == suffix:
  41. datas.append(load(path))
  42. elif ".jsonl" == suffix:
  43. datas.append(loads(path))
  44. return datas
  45. def dump(path: str, datas: List[Dict]):
  46. """
  47. Args:
  48. path (str): _description_
  49. datas (List[Dict]): _description_
  50. """
  51. prex = path.rsplit(os.sep, 1)[0]
  52. os.makedirs(prex, exist_ok=True)
  53. with open(path, "w", encoding="UTF-8") as fw:
  54. json.dump(datas, fw, ensure_ascii=False, indent=4)
  55. def dumps(path: str, datas: List[Dict]):
  56. """
  57. Args:
  58. path (str): _description_
  59. datas (List[Dict]): _description_
  60. """
  61. prex = path.rsplit(os.sep, 1)[0]
  62. os.makedirs(prex, exist_ok=True)
  63. with open(path, "w", encoding="UTF-8") as fw:
  64. for obj in datas:
  65. fw.write(json.dumps(obj, ensure_ascii=False) + "\n")

数据构造

        不同模型提供的数据格式会有细微的差别,最简单的做法就是将自己的数据处理成官方提供数据的那种格式。

  1. from loguru import logger
  2. from typing import Dict, List, Any, Tuple
  3. import torch
  4. from torch.utils.data import Dataset
  5. import transformers
  6. from transformers.trainer_pt_utils import LabelSmoother
  7. from transformers import AutoTokenizer
  8. from transformers import PreTrainedTokenizer
  9. from argument import DataArguments
  10. from datas import com
  11. def dialog(
  12. tokenizer: PreTrainedTokenizer,
  13. messages: List[Dict[str, str]],
  14. model_max_length: int,
  15. ) -> List[int]:
  16. def _parse_messages(
  17. messages: List[Dict[str, str]], split_role: str = "user"
  18. ) -> Tuple[str, List[List[Dict[str, str]]]]:
  19. """
  20. Args:
  21. messages: List[Dict[str, str]]
  22. split_role: user
  23. Return: List[Tuple[Dict[str, str], Dict[str, str]], ...]
  24. Example:
  25. >>> messages = [{'role': 'system', 'content': '你是一个人工智能助理'}, {'role': 'user', 'content':'你好'}, {'role': 'assistant', 'content':'你好啊'}, ...]
  26. >>> 你是一个人工智能助理, [[{'role': 'user', 'content': '你好'}, {'role': 'assistant', 'content': '你好啊'}], ...]
  27. """
  28. system, rounds = "", []
  29. round = []
  30. for i, message in enumerate(messages):
  31. if message["role"] == "system":
  32. assert i == 0
  33. system = message["content"]
  34. continue
  35. # 结束一轮对话才将数据添加到 rounds
  36. if message["role"] == split_role and round:
  37. rounds.append(round)
  38. round = []
  39. round.append(message)
  40. if round: # 最后的数据也添加到 rounds
  41. rounds.append(round)
  42. return system, rounds
  43. system, rounds = _parse_messages(messages)
  44. system_ids = tokenizer.encode(f"<|im_start|>system\n{system}<|im_end|>\n")
  45. input_ids: List[int] = []
  46. # 如果是多轮对话时, 只取后面几轮对话, 即[n:]
  47. for i, round in enumerate(rounds[::-1]): # 从后往前遍历
  48. # 一轮完整对话
  49. text_id = []
  50. for message in round:
  51. role, content = message["role"], message["content"]
  52. if role == "user":
  53. cont = f"<|im_start|>user\n{content}<|im_end|>\n"
  54. else:
  55. # 最后一次对话需要做特殊处理
  56. if role == "assistant" and i == 0:
  57. cont = f"<|im_start|>assistant\n{content}"
  58. else:
  59. cont = f"<|im_start|>assistant\n{content}<|im_end|>\n"
  60. # user + assistant
  61. text_id = text_id + tokenizer.encode(cont)
  62. # 如果当前对话添加到当前轮数对话时不会超出模型最大长度, 则将其添加到对话中
  63. if len(system_ids + input_ids + text_id) > model_max_length:
  64. break
  65. else:
  66. input_ids = text_id + input_ids
  67. # 将input_ids填充到model_max_length设置的长度
  68. pad = (model_max_length - len(system_ids + input_ids)) * tokenizer.encode(
  69. "<|endoftext|>"
  70. )
  71. return system_ids + input_ids + pad
  72. def preprocess(
  73. tokenizer: PreTrainedTokenizer,
  74. messages: List[List[Dict[str, str]]],
  75. model_max_length: int,
  76. ) -> Dict[str, torch.Tensor]:
  77. texts_ids: List[List[int]] = []
  78. for msg in messages:
  79. text_id: List[int] = dialog(tokenizer, msg, model_max_length)
  80. texts_ids.append(text_id)
  81. input_ids = torch.tensor(texts_ids, dtype=torch.int)
  82. target_ids = input_ids.clone()
  83. target_ids[target_ids == tokenizer.pad_token_id] = LabelSmoother.ignore_index
  84. attention_mask = input_ids.ne(tokenizer.pad_token_id) # True or False
  85. return {
  86. "input_ids": input_ids,
  87. "target_ids": target_ids,
  88. "attention_mask": attention_mask,
  89. }
  90. class SupervisedDataset(Dataset):
  91. def __init__(
  92. self,
  93. raw_data: List[Dict[str, Any]],
  94. tokenizer: PreTrainedTokenizer,
  95. model_max_length: int,
  96. ):
  97. super().__init__()
  98. self.tokenizer = tokenizer
  99. # 格式化数据
  100. self.messages: List[List[Dict[str, str]]] = self.format(raw_data)
  101. # 将文本转成id
  102. data_dict: Dict[str, torch.Tensor] = preprocess(
  103. tokenizer, self.messages, model_max_length
  104. )
  105. self.input_ids = data_dict["input_ids"]
  106. self.target_ids = data_dict["target_ids"]
  107. self.attention_mask = data_dict["attention_mask"]
  108. def __len__(self):
  109. return len(self.input_ids)
  110. def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  111. input_ids = self.input_ids[i]
  112. target_ids = self.target_ids[i]
  113. attention_mask = self.attention_mask[i]
  114. if i == 0:
  115. target = [i for i in target_ids.tolist() if i != -100]
  116. logger.debug(
  117. f"text: {self.tokenizer.decode(input_ids.tolist())}\n{input_ids.tolist()}"
  118. )
  119. logger.debug(f"label: {self.tokenizer.decode(target)}\n{target}")
  120. return {
  121. "input_ids": input_ids,
  122. "labels": target_ids,
  123. "attention_mask": attention_mask,
  124. }
  125. def format(self, datas: List[List[Dict[str, Any]]]) -> List[List[Dict[str, str]]]:
  126. lis = []
  127. for data in datas:
  128. for message in data:
  129. lis.append(
  130. [
  131. {"role": "system", "content": message["system"]},
  132. {"role": "user", "content": message["query"]},
  133. {"role": "assistant", "content": message["answer"]},
  134. ]
  135. )
  136. return lis
  137. def make_supervised_data_module(
  138. tokenizer: PreTrainedTokenizer,
  139. data_args: DataArguments,
  140. model_max_length: int,
  141. ) -> Dict[str, SupervisedDataset]:
  142. logger.info("Loading data...")
  143. train_data = com.read_datas(com.get_paths(data_args.data_path))
  144. eval_data = com.read_datas(com.get_paths(data_args.eval_data_path))
  145. logger.info("data loading fished")
  146. train_dataset = SupervisedDataset(train_data, tokenizer, model_max_length)
  147. eval_dataset = SupervisedDataset(eval_data, tokenizer, model_max_length)
  148. return {"train_dataset": train_dataset, "eval_dataset": eval_dataset}
  149. if __name__ == "__main__":
  150. parser = transformers.HfArgumentParser((DataArguments))
  151. (data_args,) = parser.parse_args_into_dataclasses()
  152. tokenizer = AutoTokenizer.from_pretrained(
  153. "/root/.cache/modelscope/hub/qwen/Qwen1___5-7B-Chat",
  154. cache_dir=None,
  155. model_max_length=1024,
  156. padding_side="right",
  157. use_fast=False,
  158. )
  159. data_module = make_supervised_data_module(
  160. tokenizer=tokenizer,
  161. data_args=data_args,
  162. model_max_length=1024,
  163. )
  164. for data in data_module["train_dataset"]:
  165. logger.debug(data)

参数设置

        参数主要用于模型微调,例如学习率,训练集、测试集路径,以及大模型的路径等等。通常情况下可以将参数的设置写到 **.sh 文件,或者写在 Python 文件中。

  1. from dataclasses import dataclass, field
  2. from typing import Optional, List
  3. import transformers
  4. @dataclass
  5. class DataArguments:
  6. data_path: Optional[str] = field(
  7. default="/root/autodl-tmp/Qwen-our/datas/handled/devs/battle_death",
  8. metadata={"help": "Path to the training data."},
  9. )
  10. eval_data_path: Optional[str] = field(
  11. default="/root/autodl-tmp/Qwen-our/datas/handled/trains/activity_1",
  12. metadata={"help": "Path to the evaluation data."},
  13. )
  14. @dataclass
  15. class ModelArguments:
  16. model_name_or_path: Optional[str] = field(
  17. default="/root/.cache/modelscope/hub/qwen/Qwen1___5-7B-Chat"
  18. )
  19. @dataclass
  20. class TrainingArguments(transformers.TrainingArguments):
  21. output_dir: Optional[str] = field(
  22. default="/root/autodl-tmp/Qwen-our/sft",
  23. metadata={"help": "model output"},
  24. )
  25. cache_dir: Optional[str] = field(default=None)
  26. optim: str = field(default="adamw_torch")
  27. model_max_length: int = field(
  28. default=1024,
  29. metadata={
  30. "help": "Maximum sequence length. Sequences will be right padded (and possibly truncated)."
  31. },
  32. )
  33. bf16: bool = field(default=True)
  34. num_train_epochs: int = field(default=1)
  35. per_device_train_batch_size: int = field(default=2)
  36. per_device_eval_batch_size: int = field(default=2)
  37. gradient_accumulation_steps: int = field(default=2)
  38. evaluation_strategy: str = field(default="epoch")
  39. save_strategy: str = field(default="epoch")
  40. save_total_limit: int = field(default=3)
  41. learning_rate: int = field(default=5e-5)
  42. weight_decay: int = field(default=0.01)
  43. adam_beta2: int = field(default=0.95)
  44. warmup_ratio: int = field(default=0.01)
  45. lr_scheduler_type: str = field(default="cosine")
  46. logging_steps: int = field(default=1)
  47. gradient_checkpointing: bool = field(default=True)
  48. use_lora: bool = field(default=True)
  49. @dataclass
  50. class LoraArguments:
  51. lora_r: int = 64
  52. lora_alpha: int = 16
  53. lora_dropout: float = 0.05
  54. lora_target_modules: List[str] = field(
  55. default_factory=lambda: [
  56. "q_proj",
  57. "k_proj",
  58. "v_proj",
  59. "o_proj",
  60. "up_proj",
  61. "gate_proj",
  62. "down_proj",
  63. ]
  64. )
  65. lora_weight_path: str = ""
  66. lora_bias: str = "none"
  67. q_lora: bool = False

LoRA微调

        接下来就到微调模型阶段的,这个没什么好说的,如果对模型的架构不是很了解的话,建议不要随意修改官方的源码,很容易出现问题,一时之间又摸不着头脑。

  1. import logging
  2. import os
  3. # os.environ["CUDA_VISIBLE_DEVICES"] = "2" # 指定使用哪一张显卡
  4. import pathlib
  5. import torch
  6. from deepspeed import zero
  7. from deepspeed.runtime.zero.partition_parameters import ZeroParamStatus
  8. import transformers
  9. from transformers import AutoModelForCausalLM, AutoTokenizer
  10. from transformers import Trainer, BitsAndBytesConfig
  11. from transformers.integrations import deepspeed
  12. from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
  13. from peft import LoraConfig, TaskType, get_peft_model
  14. from accelerate.utils import DistributedType
  15. from argument import DataArguments, ModelArguments, TrainingArguments, LoraArguments
  16. import dataSets
  17. def maybe_zero_3(param):
  18. if hasattr(param, "ds_id"):
  19. assert param.ds_status == ZeroParamStatus.NOT_AVAILABLE
  20. with zero.GatheredParameters([param]):
  21. param = param.data.detach().cpu().clone()
  22. else:
  23. param = param.detach().cpu().clone()
  24. return param
  25. # Borrowed from peft.utils.get_peft_model_state_dict
  26. def get_peft_state_maybe_zero_3(named_params, bias):
  27. if bias == "none":
  28. to_return = {k: t for k, t in named_params if "lora_" in k}
  29. elif bias == "all":
  30. to_return = {k: t for k, t in named_params if "lora_" in k or "bias" in k}
  31. elif bias == "lora_only":
  32. to_return = {}
  33. maybe_lora_bias = {}
  34. lora_bias_names = set()
  35. for k, t in named_params:
  36. if "lora_" in k:
  37. to_return[k] = t
  38. bias_name = k.split("lora_")[0] + "bias"
  39. lora_bias_names.add(bias_name)
  40. elif "bias" in k:
  41. maybe_lora_bias[k] = t
  42. for k, t in maybe_lora_bias:
  43. if bias_name in lora_bias_names:
  44. to_return[bias_name] = t
  45. else:
  46. raise NotImplementedError
  47. to_return = {k: maybe_zero_3(v) for k, v in to_return.items()}
  48. return to_return
  49. def safe_save_model_for_hf_trainer(
  50. trainer: transformers.Trainer, output_dir: str, bias="none"
  51. ):
  52. """Collects the state dict and dump to disk."""
  53. # check if zero3 mode enabled
  54. if deepspeed.is_deepspeed_zero3_enabled():
  55. state_dict = trainer.model_wrapped._zero3_consolidated_16bit_state_dict()
  56. else:
  57. if trainer.args.use_lora:
  58. state_dict = get_peft_state_maybe_zero_3(
  59. trainer.model.named_parameters(), bias
  60. )
  61. else:
  62. state_dict = trainer.model.state_dict()
  63. if trainer.args.should_save and trainer.args.local_rank == 0:
  64. trainer._save(output_dir, state_dict=state_dict)
  65. def train():
  66. parser = transformers.HfArgumentParser(
  67. (ModelArguments, DataArguments, TrainingArguments, LoraArguments)
  68. )
  69. (
  70. model_args,
  71. data_args,
  72. training_args,
  73. lora_args,
  74. ) = parser.parse_args_into_dataclasses()
  75. # This serves for single-gpu qlora.
  76. if (
  77. getattr(training_args, "deepspeed", None)
  78. and int(os.environ.get("WORLD_SIZE", 1)) == 1
  79. ):
  80. training_args.distributed_state.distributed_type = DistributedType.DEEPSPEED
  81. device_map = None
  82. world_size = int(os.environ.get("WORLD_SIZE", 1))
  83. ddp = world_size != 1
  84. # 加载 qlora
  85. if lora_args.q_lora:
  86. device_map = {"": int(os.environ.get("LOCAL_RANK") or 0)} if ddp else "auto"
  87. if len(training_args.fsdp) > 0 or deepspeed.is_deepspeed_zero3_enabled():
  88. logging.warning("FSDP or ZeRO3 is incompatible with QLoRA.")
  89. model_load_kwargs = {
  90. "low_cpu_mem_usage": not deepspeed.is_deepspeed_zero3_enabled(),
  91. }
  92. compute_dtype = (
  93. torch.float16
  94. if training_args.fp16
  95. else (torch.bfloat16 if training_args.bf16 else torch.float32)
  96. )
  97. # Load model and tokenizer
  98. config = transformers.AutoConfig.from_pretrained(
  99. model_args.model_name_or_path,
  100. cache_dir=training_args.cache_dir,
  101. )
  102. config.use_cache = False
  103. model = AutoModelForCausalLM.from_pretrained(
  104. model_args.model_name_or_path,
  105. config=config,
  106. cache_dir=training_args.cache_dir,
  107. device_map=device_map,
  108. # 同时设置 lora 和 qlora 时设置量化
  109. quantization_config=(
  110. BitsAndBytesConfig(
  111. load_in_4bit=True,
  112. bnb_4bit_use_double_quant=True,
  113. bnb_4bit_quant_type="nf4",
  114. bnb_4bit_compute_dtype=compute_dtype,
  115. )
  116. if training_args.use_lora and lora_args.q_lora
  117. else None
  118. ),
  119. torch_dtype=torch.bfloat16, # 使用混合精度
  120. **model_load_kwargs,
  121. )
  122. tokenizer = AutoTokenizer.from_pretrained(
  123. model_args.model_name_or_path,
  124. cache_dir=training_args.cache_dir,
  125. model_max_length=training_args.model_max_length,
  126. padding_side="right",
  127. use_fast=False,
  128. )
  129. if training_args.use_lora:
  130. lora_config = LoraConfig(
  131. r=lora_args.lora_r,
  132. lora_alpha=lora_args.lora_alpha,
  133. target_modules=lora_args.lora_target_modules,
  134. lora_dropout=lora_args.lora_dropout,
  135. bias=lora_args.lora_bias,
  136. task_type=TaskType.CAUSAL_LM,
  137. )
  138. if lora_args.q_lora:
  139. model = prepare_model_for_kbit_training(
  140. model, use_gradient_checkpointing=training_args.gradient_checkpointing
  141. )
  142. model = get_peft_model(model, lora_config)
  143. # Print peft trainable params
  144. model.print_trainable_parameters()
  145. if training_args.gradient_checkpointing:
  146. model.enable_input_require_grads()
  147. # Load data
  148. data_module = dataSets.make_supervised_data_module(
  149. tokenizer=tokenizer,
  150. data_args=data_args,
  151. model_max_length=training_args.model_max_length,
  152. )
  153. # Start trainer
  154. trainer = Trainer(
  155. model=model,
  156. tokenizer=tokenizer,
  157. args=training_args,
  158. train_dataset=data_module["train_dataset"],
  159. eval_dataset=data_module["eval_dataset"],
  160. )
  161. # `not training_args.use_lora` is a temporary workaround for the issue that there are problems with
  162. # loading the checkpoint when using LoRA with DeepSpeed.
  163. # Check this issue https://github.com/huggingface/peft/issues/746 for more information.
  164. # 微调
  165. if (
  166. list(pathlib.Path(training_args.output_dir).glob("checkpoint-*"))
  167. and not training_args.use_lora
  168. ):
  169. trainer.train(resume_from_checkpoint=True)
  170. else:
  171. trainer.train()
  172. trainer.save_state()
  173. safe_save_model_for_hf_trainer(
  174. trainer=trainer, output_dir=training_args.output_dir, bias=lora_args.lora_bias
  175. )
  176. if __name__ == "__main__":
  177. train()

微调后处理

        使用 LoRA 进行微调后,会生成一个 LoRA 的模型,该模型参数量很少,后续模型在推理时需要加载基座模型和 LoRA 模型。针对微调后的模型,在推理阶段时,下面介绍两种方案,可以根据需要进行选择。

参数合并

        模型微调完成时,可以将基座模型参数和LoRA模型参数合并成一个新的大模型,后续在加载新模型跟加载基座模型的方法是一样的。但是,如果对应用场景有要求的话,比如角色扮演,一般不建议将模型合并,因为加载一个新的大模型比较耗时,而加载 LoRA 模型就很快。

参数不合并

        参数不合并的意思是分别加载基座模型和 LoRA 模型,加载方法也很简单,不过值得注意的是,由于分别加载基座模型和 LoRA 模型,需要将两个模型的参数都放在显卡或 CPU 上,不然会出错。

推理阶段

prompt构造

        微调和推理阶段时构造的 prompt 存在细微的差别,主要差别如下:

  1. 微调阶段需要提供标签(label),推理阶段不需要提供标签;
  2. 微调阶段,无论单轮还是多轮对话,都需要提供完整的[query,answer,......],而在推理阶段,最后一次对话只需提供 query,即[query,answer,query......];
  3. 针对多轮对话,需要考虑文本长度是否会超出模型所能接收最大长度的问题,如果超出的话,只截取最后几轮对话,即[......,queryn-1,answern-1,queryn]。与超出模型接收的最大长度后直接截断的方法相比,这种处理方式更能确保对话时语义的完整性。
  1. message = [
  2. {
  3. "role": "system",
  4. "content": "你是一个擅长猜测人类意图的人工智能助手,下面有选项供你选择用户的意图,请选择:\nA. <通用>\nB. <媒体>\nC. <系统控制>\nD. <天气>\nE. <车控>\nF. <导航>\nG. <蓝牙电话>"
  5. },
  6. {"role": "user", "content": "查询下明天天气温度"},
  7. {"role": "assistant", "content": "<天气>"},
  8. {"role": "user", "content": "你协助我开电台"},
  9. {"role": "assistant", "content": "<媒体>"},
  10. {"role": "user", "content": "开启最大除霜"},
  11. {"role": "assistant", "content": "<车控>"},
  12. {"role": "user", "content": "右车窗开开一下"}
  13. ]
  14. # result
  15. """
  16. <|im_start|>system
  17. 你是一个擅长猜测人类意图的人工智能助手,下面有选项供你选择用户的意图,请选择:
  18. A. <通用>
  19. B. <媒体>
  20. C. <系统控制>
  21. D. <天气>
  22. E. <车控>
  23. F. <导航>
  24. G. <蓝牙电话><|im_end|>
  25. <|im_start|>user
  26. 开启最大除霜<|im_end|>
  27. <|im_start|>assistant
  28. <车控><|im_end|>
  29. <|im_start|>user
  30. 右车窗开开一下<|im_end|>
  31. <|im_start|>assistant
  32. """

参数合并推理

        上面讲过,如果在微调结束时,选择将基座模型和 LoRA 模型的参数合并成一个新的模型时,那么加载方法跟加载基座模型的方式是一样的。

参数不合并推理

        一种方法是将基座模型和 LoRA 模型的参数合并在一起,成为一个新的大模型,另一种方式是不将基座模型和LoRA模型参数合并在一起,这时需要分别加载这两个模型。

  1. import torch
  2. from peft import PeftModel
  3. from typing import List, Union, Dict, Optional, Tuple, Any
  4. import os
  5. # os.environ["CUDA_VISIBLE_DEVICES"] = "3"
  6. import transformers
  7. from loguru import logger
  8. from transformers import AutoTokenizer, AutoModelForCausalLM
  9. from datas import com
  10. device = torch.device("cuda")
  11. base_model_path = "/root/.cache/modelscope/hub/qwen/Qwen1___5-7B-Chat"
  12. lora_model_path = "/root/autodl-tmp/Qwen-our/sft/checkpoint-4"
  13. test_data_path = "/root/autodl-tmp/Qwen-our/datas/handled/devs"
  14. def init_model(
  15. base_model_path: str, lora_model_path: str
  16. ) -> Tuple[AutoTokenizer, PeftModel]:
  17. """
  18. Args:
  19. base_model_path (str): _description_
  20. lora_model_path (str): _description_
  21. Returns:
  22. Tuple[AutoTokenizer, PeftModel]: _description_
  23. """
  24. config = transformers.AutoConfig.from_pretrained(
  25. base_model_path,
  26. cache_dir=None,
  27. )
  28. config.use_cache = False
  29. base_model = AutoModelForCausalLM.from_pretrained(
  30. base_model_path,
  31. config=config,
  32. cache_dir=None,
  33. device_map="auto",
  34. quantization_config=None,
  35. torch_dtype=torch.bfloat16,
  36. ).to(device)
  37. tokenizer = AutoTokenizer.from_pretrained(
  38. base_model_path,
  39. cache_dir=None,
  40. model_max_length=1024,
  41. padding_side="right",
  42. use_fast=False,
  43. )
  44. new_model = PeftModel.from_pretrained(
  45. base_model,
  46. lora_model_path,
  47. device_map="auto",
  48. torch_dtype=torch.bfloat16,
  49. ).to(device)
  50. return tokenizer, new_model
  51. def format(instance: List[Dict[str, Any]]) -> List[Dict[str, str]]:
  52. message = []
  53. for i, ins in enumerate(instance):
  54. system = ins["system"]
  55. query = ins["query"]
  56. answer = ins["answer"]
  57. if i == 0:
  58. message.append({"role": "system", "content": system})
  59. message.append({"role": "user", "content": query})
  60. message.append({"role": "assistant", "content": answer})
  61. return message
  62. def processing(
  63. tokenizer: AutoTokenizer,
  64. messages: List[Dict[str, str]],
  65. model_max_length: int,
  66. ) -> List[int]:
  67. def _parse_messages(
  68. messages: List[Dict[str, str]], split_role: str = "user"
  69. ) -> Tuple[str, List[List[Dict[str, str]]]]:
  70. system, rounds = "", []
  71. round = []
  72. for i, message in enumerate(messages):
  73. if message["role"] == "system":
  74. assert i == 0
  75. system = message["content"]
  76. continue
  77. # 结束一轮对话才将数据添加到 rounds
  78. if message["role"] == split_role and round:
  79. rounds.append(round)
  80. round = []
  81. round.append(message)
  82. if round: # 最后的数据也添加到 rounds
  83. rounds.append(round)
  84. return system, rounds
  85. system, rounds = _parse_messages(messages)
  86. system_ids = tokenizer.encode(f"<|im_start|>system\n{system}<|im_end|>\n")
  87. input_ids: List[int] = []
  88. # 如果是多轮对话时, 只取后面几轮对话, 即[n:]
  89. for i, round in enumerate(rounds[::-1]): # 从后往前遍历
  90. # 一轮完整对话
  91. text_id = []
  92. for message in round:
  93. role, content = message["role"], message["content"]
  94. if role == "user":
  95. if i == 0: # 末尾需要特殊处理
  96. cont = f"<|im_start|>user\n{content}<|im_end|>\n<|im_start|>assistant\n"
  97. else:
  98. cont = f"<|im_start|>user\n{content}<|im_end|>\n"
  99. else:
  100. cont = f"<|im_start|>assistant\n{content}<|im_end|>\n"
  101. # [user + assistant] or [user]
  102. text_id = text_id + tokenizer.encode(cont)
  103. # 如果当前对话添加到当前轮数对话时不会超出模型最大长度, 则将其添加到对话中
  104. if len(system_ids + input_ids + text_id) > model_max_length:
  105. break
  106. else:
  107. input_ids = text_id + input_ids
  108. # system + [query + answer, ..., query]
  109. return system_ids + input_ids
  110. def main(model_max_length: int = 1024, debug: bool = False):
  111. # 初始化模型
  112. tokenizer, model = init_model(base_model_path, lora_model_path)
  113. # 加载数据
  114. paths: List[str] = com.get_paths(test_data_path)
  115. for path in paths:
  116. # 加载数据
  117. message: List[List[Dict[str, str]]] = format(com.load(path))
  118. text_ids: List[int] = processing(tokenizer, message, model_max_length)
  119. # debug
  120. if debug:
  121. logger.debug(f"{len(text_ids)} {tokenizer.decode(text_ids)}")
  122. # 推理
  123. input_ids = torch.tensor([text_ids]).to(device)
  124. generated_ids = model.generate(input_ids=input_ids, max_new_tokens=512)
  125. generated_ids = [
  126. output_ids[len(input_ids) :]
  127. for input_ids, output_ids in zip(input_ids, generated_ids)
  128. ]
  129. response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  130. # 保存数据
  131. com.dump(
  132. path.replace(f"{os.sep}handled{os.sep}", f"{os.sep}results{os.sep}"),
  133. message + [{"role": "predict", "content": response}],
  134. )
  135. if __name__ == "__main__":
  136. main(model_max_length=1024, debug=False)

参考链接

  1. https://github.com/baichuan-inc/Baichuan2
  2. https://github.com/QwenLM/Qwen1.5
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/寸_铁/article/detail/870701
推荐阅读
相关标签
  

闽ICP备14008679号