当前位置:   article > 正文

Bert Model 训练遇到的问题描述_exception: model "bert-base-uncased" on the hub do

exception: model "bert-base-uncased" on the hub doesn't have a tokenizer

1. tokenizer.encode() 方法 与 tokenizer.tokenize() 之间的区别:
(1) tokenizer.encode() 返回其在字典中的id

(2) tokenizer.tokenize() 返回 token

  1. def bert_():
  2. model_name = 'bert-base-chinese'
  3. MODEL_PATH = 'F:/models/bert-base-chinese/'
  4. # a.通过词典导入分词器
  5. tokenizer = BertTokenizer.from_pretrained(model_name)
  6. # b. 导入配置文件
  7. model_config = BertConfig.from_pretrained(model_name)
  8. # 修改配置
  9. model_config.output_hidden_states = True
  10. model_config.output_attentions = True
  11. # 通过配置和路径导入模型
  12. bert_model = BertModel.from_pretrained(MODEL_PATH, config=model_config)
  13. # tokenizer.encode()
  14. sen_code_encode = tokenizer.encode("自然语")
  15. print("sen_code_encode",sen_code)
  16. # tokenizer.tokenize
  17. sen_code_tokenizer = tokenizer.tokenize("自然语")
  18. print("sen_code_tokenizer", sen_code0)
  19. if __name__ == '__main__':
  20. bert_()
本文内容由网友自发贡献,转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/286459
推荐阅读
相关标签
  

闽ICP备14008679号