当前位置:   article > 正文

bert预测被掩住的字_bert中的掩蔽词预测

bert中的掩蔽词预测
  1. import torch
  2. from transformers import BertTokenizer, BertModel, BertForMaskedLM
  3. # OPTIONAL: if you want to have more information on what's happening under the hood, activate the logger as follows
  4. import logging
  5. logging.basicConfig(level=logging.INFO)
  6. # Load pre-trained model tokenizer (vocabulary)
  7. tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
  8. # Tokenize input
  9. text = "你叫什么名字"
  10. tokenized_text = tokenizer.tokenize(text)
  11. # Mask a token that we will try to predict back with `BertForMaskedLM`
  12. masked_index = 2
  13. tokenized_text[masked_index] = '[MASK]'
  14. # Convert token to vocabulary indices,这个词汇转数字的表是预先定义好了的
  15. indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
  16. # Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
  17. segments_ids = [0,0,0,1,1,1]#[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
  18. # Convert inputs to PyTorch tensors
  19. tokens_tensor = torch.tensor([indexed_tokens])
  20. segments_tensors = torch.tensor([segments_ids])
  21. model = BertModel.from_pretrained('bert-base-chinese')
  22. # Set the model in evaluation mode to deactivate the DropOut modules
  23. # This is IMPORTANT to have reproducible results during evaluation!
  24. model.eval()
  25. # If you have a GPU, put everything on cuda
  26. tokens_tensor = tokens_tensor.to('cuda')
  27. segments_tensors = segments_tensors.to('cuda')
  28. model.to('cuda')
  29. # Predict hidden states features for each layer
  30. with torch.no_grad():
  31. # See the models docstrings for the detail of the inputs
  32. outputs = model(tokens_tensor, token_type_ids=segments_tensors)#torch.Size([1, 14, 768]),torch.Size([1, 768])
  33. # Transformers models always output tuples.
  34. # See the models docstrings for the detail of all the outputs
  35. # In our case, the first element is the hidden state of the last layer of the Bert model
  36. # We have encoded our input sequence in a FloatTensor of shape (batch size, sequence length, model hidden dimension)
  37. encoded_layers = outputs[0]
  38. # 加载预训练模型(权重)
  39. model = BertForMaskedLM.from_pretrained('bert-base-chinese')
  40. model.eval()
  41. # 如果你有GPU,把所有东西都放在cuda上
  42. tokens_tensor = tokens_tensor.to('cuda')
  43. segments_tensors = segments_tensors.to('cuda')
  44. model.to('cuda')
  45. # 预测所有标记
  46. with torch.no_grad():
  47. outputs = model(tokens_tensor, token_type_ids=segments_tensors)
  48. predictions = outputs[0]
  49. # 确认我们能预测“henson”
  50. predicted_index = torch.argmax(predictions[0, masked_index]).item()
  51. predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
  52. print(predicted_token)

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/喵喵爱编程/article/detail/787170
推荐阅读
相关标签
  

闽ICP备14008679号