赞
踩
参数高效微调PEFT(一)快速入门BitFit、Prompt Tuning、Prefix Tuning
这里的改进都是加引号的
,比如说:P-Tuning是21年3月份发布的,而Prompt Tuning是21年4月发布的,都是同一时间的工作。Prompt Tuning原理如下图所示:冻结主模型全部参数,在训练数据前加入一小段Prompt,只训练Prompt的表示层,即一个Embedding模块。论文实验表明,只要模型规模够大,简单加入 Prompt tokens 进行微调,就能取得很好的效果。
P Tuning将Prompt转换为可以学习的Embedding层,并用MLP+LSTM的方式来对Prompt Embedding进行一层处理
。
LSTM+MLP去编码这些virtual token以后,再输入到模型
。我们来看下peft\tuners\p_tuning.py
中的内容:
# peft\tuners\p_tuning.py
class PromptEncoderReparameterizationType(str, enum.Enum):
MLP = "MLP"
LSTM = "LSTM"
# peft\tuners\p_tuning.py @dataclass class PromptEncoderConfig(PromptLearningConfig): encoder_reparameterization_type: Union[str, PromptEncoderReparameterizationType] = field( default=PromptEncoderReparameterizationType.MLP, metadata={"help": "How to reparameterize the prompt encoder"}, ) encoder_hidden_size: int = field( default=None, metadata={"help": "The hidden size of the prompt encoder"}, ) encoder_num_layers: int = field( default=2, metadata={"help": "The number of layers of the prompt encoder"}, ) encoder_dropout: float = field( default=0.0, metadata={"help": "The dropout of the prompt encoder"}, ) def __post_init__(self): self.peft_type = PeftType.P_TUNING
class PromptEncoder(torch.nn.Module): """ Input shape: (`batch_size`, `total_virtual_tokens`) Output shape: (`batch_size`, `total_virtual_tokens`, `token_dim`) """ def __init__(self, config): super().__init__() self.token_dim = config.token_dim self.input_size = self.token_dim self.output_size = self.token_dim self.hidden_size = config.encoder_hidden_size self.total_virtual_tokens = config.num_virtual_tokens * config.num_transformer_submodules self.encoder_type = config.encoder_reparameterization_type # embedding self.embedding = torch.nn.Embedding(self.total_virtual_tokens, self.token_dim) if not config.inference_mode: if self.encoder_type == PromptEncoderReparameterizationType.LSTM: lstm_dropout = config.encoder_dropout num_layers = config.encoder_num_layers # LSTM self.lstm_head = torch.nn.LSTM( input_size=self.input_size, hidden_size=self.hidden_size, num_layers=num_layers, # 深层LSTM dropout=lstm_dropout, bidirectional=True, # 双向 batch_first=True, # batch_size在第一维 ) self.mlp_head = torch.nn.Sequential( torch.nn.Linear(self.hidden_size * 2, self.hidden_size * 2), torch.nn.ReLU(), torch.nn.Linear(self.hidden_size * 2, self.output_size), ) elif self.encoder_type == PromptEncoderReparameterizationType.MLP: encoder_num_layers_default = PromptEncoderConfig.encoder_num_layers layers = [ torch.nn.Linear(self.input_size, self.hidden_size), torch.nn.ReLU(), torch.nn.Linear(self.hidden_size, self.hidden_size), torch.nn.ReLU(), torch.nn.Linear(self.hidden_size, self.output_size), ] self.mlp_head = torch.nn.Sequential(*layers) else: raise ValueError("Prompt encoder type not recognized. Please use one of MLP (recommended) or LSTM.") def forward(self, indices): # 1、先进行embedding input_embeds = self.embedding(indices) # 2、embedding后,再进行编码 if self.encoder_type == PromptEncoderReparameterizationType.LSTM: output_embeds = self.mlp_head(self.lstm_head(input_embeds)[0]) elif self.encoder_type == PromptEncoderReparameterizationType.MLP: output_embeds = self.mlp_head(input_embeds) else: raise ValueError("Prompt encoder type not recognized. Please use one of MLP (recommended) or LSTM.") return output_embeds
peft\peft_model.py
中PeftModelForCausalLM代码如下,通过配置文件的类型来判断PEFT方法到底是PrefixTuning/PTuningV2,还是PromptTuning/PTuningV1。
虚拟token的embedding直接concat到原始输入序列的前面
,送入base model模型进行推理。# peft\peft_model.py if peft_config.peft_type == PeftType.PREFIX_TUNING: # 如果为PREFIX_TUNING,需要给每一个transformer block的key和value添加虚拟token的embedding ...... else: # PromptTuning/PTuningV1 分支 if inputs_embeds is None: # 计算prompt以外输入内容的embedding inputs_embeds = self.word_embeddings(input_ids) # concat prompt labels if labels is not None: prefix_labels = torch.full((batch_size, peft_config.num_virtual_tokens), -100).to(labels.device) kwargs["labels"] = torch.cat((prefix_labels, labels), dim=1) # prompt内容的embedding prompts = self.get_prompt(batch_size=batch_size) prompts = prompts.to(inputs_embeds.dtype) # 将prompt embedding 和原始的embedding 一起送到base model进行推理计算 inputs_embeds = torch.cat((prompts, inputs_embeds), dim=1) return self.base_model(inputs_embeds=inputs_embeds, **kwargs)
我们只需要在加载原模型后、配置训练器前加peft的代码即可。
from peft import PromptEncoderConfig, TaskType, get_peft_model, PromptEncoderReparameterizationType config = PromptEncoderConfig(task_type=TaskType.CAUSAL_LM, num_virtual_tokens=10, encoder_reparameterization_type=PromptEncoderReparameterizationType.MLP, encoder_dropout=0.1, encoder_num_layers=5, encoder_hidden_size=1024) model = get_peft_model(model, config) # 打印可训练参数信息 model.print_trainable_parameters() trainable params: 3,159,040 || all params: 348,928,000 || trainable%: 0.9053558327219369
(base) root@autodl-container-adbc11ae52-f2ebff02:~# nvidia-smi Tue May 28 15:15:53 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:B1:00.0 Off | N/A | | 33% 59C P2 168W / 250W | 2870MiB / 11264MiB | 45% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
Prompt Tuning和P-Tuning等方法存在两个主要的问题:
第一,缺乏模型参数规模和任务通用性。
对于那些较小的模型(从100M到1B),提示优化和全量微调的表现有很大差异
,这大大限制了提示优化的适用性。第二,缺少深度提示优化
。我们知道在Prompt Tuning和P-tuning中,只被插入transformer第一层的输入embedding序列中,在接下来的transformer层中,插入Prompt的位置的embedding是由之前的transformer层计算出来的。
考虑到这些问题,作者提出了P-Tuning v2。
# peft/tuners/prefix_tuning.py # Based on https://github.com/THUDM/P-tuning-v2/blob/main/model/prefix_encoder.py # with some refactor class PrefixEncoder(torch.nn.Module): def __init__(self, config): super().__init__() self.prefix_projection = config.prefix_projection token_dim = config.token_dim num_layers = config.num_layers encoder_hidden_size = config.encoder_hidden_size num_virtual_tokens = config.num_virtual_tokens if self.prefix_projection and not config.inference_mode: # Use a two-layer MLP to encode the prefix # Prefix Tuning 进行重新参数化编码(通过MLP) self.embedding = torch.nn.Embedding(num_virtual_tokens, token_dim) self.transform = torch.nn.Sequential( torch.nn.Linear(token_dim, encoder_hidden_size), torch.nn.Tanh(), torch.nn.Linear(encoder_hidden_size, num_layers * 2 * token_dim), ) else: # P-Tuning v2 self.embedding = torch.nn.Embedding(num_virtual_tokens, num_layers * 2 * token_dim) def forward(self, prefix: torch.Tensor): if self.prefix_projection: # Prefix Tuning # 先进行Embedding 此时shape为:(batch_size, num_virtual_tokens) # 再进行重新参数化编码,此时shape为:(batch_size, num_virtual_tokens, 2*layers*hidden) prefix_tokens = self.embedding(prefix) past_key_values = self.transform(prefix_tokens) else: # P-Tuning v2, 没有进行重参数化编码 past_key_values = self.embedding(prefix) return past_key_values
P-Tuning V2具体做法基本同Prefix Tuning,可以看作是将文本生成的Prefix Tuning技术适配到NLU任务中,然后做了一些改进:
我们只需要在加载原模型后、配置训练器前加peft的代码即可。
from peft import PrefixTuningConfig, get_peft_model, TaskType
# 和Prefix Tuning不同的是设置prefix_projection=False
config = PrefixTuningConfig(task_type=TaskType.CAUSAL_LM, num_virtual_tokens=10, prefix_projection=False)
model = get_peft_model(model, config)
# 打印可训练参数信息
model.print_trainable_parameters()
trainable params: 491,520 || all params: 346,260,480 || trainable%: 0.1419509382069822
(base) root@autodl-container-adbc11ae52-f2ebff02:~# nvidia-smi Tue May 28 15:18:39 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:B1:00.0 Off | N/A | | 33% 56C P2 189W / 250W | 2826MiB / 11264MiB | 45% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。