bert预训练过程_from trainer import berttrainer

作者：Monodyee | 2024-03-06 17:12:04

踩

from trainer import berttrainer

首先找到Trainer.train()中的

Trainer.train()
1

的过程的内容
这里的Trainer在__init__.py之中有所阐明过程

from .trainer import Trainer, set_seed, torch_distributed_zero_first, EvalPrediction
1

输入的input_ids以及labels的内容为

input_ids = 
[2, 193, 194, 8982, 23, 4, 15, 1073, 3, 418, 43, 13, 319, 8981, 
 4622, 258, 4937, 4, 36, 864, 339, 1162, 3]
labels = 
[-100, -100, -100, -100, -100, 453, -100, -100, -100, -100, -100, 
 -100, -100, -100, -100, -100, -100, 83, -100, -100, -100, -100, -100]
1
2
3
4
5
6

这里的input_ids中的第5个以及第17个位置中的labels标记现在对应的数值，即453，83，还设置了ngram=[1,2,3]的概率为[0.7,0.2,0.1]，而input_ids为替换之前的数值，即概率在(0,0.150.8)的情况下mask预测自己，概率在(0.150.8,0.150.9)的情况下自己预测自己，概率在(0.150.9,1)的情况下保持原样不预测。
输出在原来的基础上加上了一个对应的网络层

  (cls): BertOnlyMLMHead(
    (predictions): BertLMPredictionHead(
      (transform): BertPredictionHeadTransform(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      )
      (decoder): Linear(in_features=768, out_features=21128, bias=True)
    )
1
2
3
4
5
6
7
8

最后损失函数为计算input_ids的输出与labels的交叉熵损失函数内容

sequence_output = outputs[0]
prediction_scores = self.cls(sequence_output)
masked_lm_loss = None
if labels is not None:
    loss_fct = CrossEntropyLoss()  # -100 index = padding token
    masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
1
2
3
4
5
6

这里的prediction_scores.size =
torch.Size([10, 13, 9448])
labels =
torch.Size([10, 13])
接下来讲解下预训练过程中的nextsentence预测
nextsentence-prediction在transformer之中的BertForNextSentencePrediction类之中
nextsentence预测之中你一波只能放置两个句子
比如你放置的句子内容如下，最长的长度为50

[CLS]谷歌和[MASK][MASK]都是不存在的。[SEP]同时，[MASK]也是不存在的。[SEP]
1

此时batch_size = 1(如果想要多个nextsentence预测构造batch_size为多波即可)，这样就构成了
(1,50)，经过bertmodel之后输出为(1,50,768)维度的矩阵，然后经过pooler和tanh激活函数之后为(1,50,768)，接着取出第0维度值(1,768)，加入一个(hidden_size,2)的线性层，算出标签概率
[[-3.0729, 5.9056]]，最后求这个概率与标签([[1]])(是下一个句子)的交叉熵损失函数
另外这里注意一下预训练之前的参数初始化的过程：

class BertPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = BertConfig
    load_tf_weights = load_tf_weights_in_bert
    base_model_prefix = "bert"
    _keys_to_ignore_on_load_missing = [r"position_ids"]

    def _init_weights(self, module):
        """ Initialize the weights """
        if isinstance(module, (nn.Linear, nn.Embedding)):
            # Slightly different from the TF version which uses truncated_normal for initialization
            # cf https://github.com/pytorch/pytorch/pull/5617
            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
        elif isinstance(module, nn.LayerNorm):
            module.bias.data.zero_()
            module.weight.data.fill_(1.0)
        if isinstance(module, nn.Linear) and module.bias is not None:
            module.bias.data.zero_()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/Monodyee/article/detail/200375