Bert Model with two heads on top as done during the pretraining: a masked language modeling
head and a next sentence prediction (classification)
MLM head: 可以简单理解是一个全连接层(实际不是,先经过liner(hidden_size>hidden_size)>激活>layernorm>liner(hidden_size>vocab_size)),预测被mask的单词
nsp head: nsp预测,也是一个全连接层, hidden_size->2
- class BertPreTrainingHeads(nn.Module):
- def __init__(self, config):
- super().__init__()
- self.predictions = BertLMPredictionHead(config) # MLM head
- self.seq_relationship = nn.Linear(config.hidden_size, 2) # NSP HEAD
- def forward(self, sequence_output, pooled_output):
- prediction_scores = self.predictions(sequence_output)
- seq_relationship_score = self.seq_relationship(pooled_output)
- return prediction_scores, seq_relationship_score
Bert Model with a language modeling
head on top for CLM fine-tuning.
只有一个MLM head, 训练目标是根据上一个词预测当前词,时因果语言建模(CLM)
- import torch
- from transformers import AutoTokenizer, BertLMHeadModel
- tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
- model = BertLMHeadModel.from_pretrained("bert-base-uncased")
- inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
- outputs = model(**inputs, labels=inputs["input_ids"])
- loss = outputs.loss
- logits = outputs.logits
Bert Model with a language modeling
head on top.
只有一个MLM head,训练目标就是预测mask
- from transformers import AutoTokenizer, BertForMaskedLM
- import torch
- tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
- model = BertForMaskedLM.from_pretrained("bert-base-uncased")
- inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
- with torch.no_grad():
- logits = model(**inputs).logits
- # retrieve index of [MASK]
- mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
- predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
- tokenizer.decode(predicted_token_id)
- labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
- # mask labels of non-[MASK] tokens
- labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)
- outputs = model(**inputs, labels=labels)
- round(outputs.loss.item(), 2)

Bert Model with a next sentence prediction (classification)
head on top.
Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.
一个全连接层head, 输出维度等于 类别数量
Bert Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.
正确答案的得分应该最高,采用softmax激活,交叉熵损失(有点类似文本匹配的listwise loss)
Bert Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits
and span end logits
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。