当前位置:   article > 正文

Simplify the Usage of Lexicon in Chinese NER跑论文代码遇坑及解决方法

simplify the usage of lexicon in chinese ner

github上给出的版本信息是:python3.6,pytorch0.4.1

遇坑:

1.安装后运行显示需要装transformers,直接pip install transformers,结果版本太高不匹配

解决:搜了一下发现装transformers3.4.0版本的比较多,就去装了3.4.0

2.装完后import transformers出现ImportError: cannot import name '_softmax_backward_data'

解决:再降版本,安装transformers2.1.1,这下可以了

3.再次运行程序,出现error:

  1. Traceback (most recent call last):
  2. File "D:\program\envs\pytorch\lib\site-packages\urllib3\connection.py", line 175, in _new_conn
  3. (self._dns_host, self.port), self.timeout, **extra_kw
  4. File "D:\program\envs\pytorch\lib\site-packages\urllib3\util\connection.py", line 95, in create_connection
  5. raise err
  6. File "D:\program\envs\pytorch\lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
  7. sock.connect(sa)
  8. TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。

解决:一开始以为是联网的问题,开关防火墙、科学上网都试过了,无解。然后debug一步一步走,发现代码里需要连接一个网址:

url:'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt'

恍然大悟是bert模块没装,自己得先下载下来(我是小白我承认5555),放到路径中:

path = 'C:\\Users\\yuyuan\\.pytorch_pretrained_bert\\bert-base-chinese'

然后修改文件functions.py,添加上面这个path,再改动对应函数:

  1. #tokenizer = BertTokenizer.from_pretrained('bert-base-chinese', do_lower_case=True)
  2. tokenizer = BertTokenizer.from_pretrained(path)

gazlstm.py里也要改:

  1. if self.use_bert:
  2. #self.bert_encoder = BertModel.from_pretrained('bert-base-chinese')
  3. self.bert_encoder = BertModel.from_pretrained(path)
  4. for p in self.bert_encoder.parameters():
  5. p.requires_grad = False

ok终于解决,可以成功调用bert(虽然这个方法很简单粗暴,再换电脑还得修改代码,害)

4.继续往下运行,输出build batched crf...后,出现错误:(这个时候已经快暴走了)

  1. cublas runtime error : the GPU program failed to execute at C:/ProgramData/Miniconda3/conda-bld/pytorch_1533096106539/work/
  2. aten/src/THC/THCBlas.cu:249

解决:一通搜索发现是显卡和CUDA不匹配……我的电脑是RTX3050Ti,只有安装CUDA11.0及以上版本才能用GPU,而我为了用pytorch0.4.1,装的是CUDA9.0。

要解决只能升级CUDA,但CUDA11.0对应的pytorch版本是1.7.1,高版本pytorch跑这个论文代码肯定会出点问题。

没办法,只能重装CUDA11.0,对应的cudnn,以及对应的pytorch

5.可以继续运行了,顺利进入训练,但开始疯狂warning刷屏,训练信息都看不到了

  1. D:\undergraduation\LexiconAugmentedNER-master\model\gazlstm.py:151: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at ..\aten\src\ATen\native\cuda\LegacyDefinitions.cpp:28.)
  2. gaz_embeds = gaz_embeds_d.data.masked_fill_(gaz_mask.data, 0) #(b,l,4,g,ge) ge:gaz_embed_dim
  3. D:\undergraduation\LexiconAugmentedNER-master\model\crf.py:97: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at ..\aten\src\ATen/native/IndexingUtils.h:25.)
  4. masked_cur_partition = cur_partition.masked_select(mask_idx)
  5. D:\undergraduation\LexiconAugmentedNER-master\model\crf.py:102: UserWarning: masked_scatter_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at ..\aten\src\ATen\native\cuda\LegacyDefinitions.cpp:72.)
  6. partition.masked_scatter_(mask_idx, masked_cur_partition)
  7. D:\undergraduation\LexiconAugmentedNER-master\model\crf.py:248: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at ..\aten\src\ATen/native/IndexingUtils.h:25.)
  8. tg_energy = tg_energy.masked_select(mask.transpose(1,0))
  9. [W ..\aten\src\ATen\native\cuda\LegacyDefinitions.cpp:72] Warning: masked_scatter_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (function masked_scatter__cuda)
  10. [W ..\aten\src\ATen\native\cuda\LegacyDefinitions.cpp:28] Warning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (function masked_fill__cuda)
  11. [W IndexingUtils.h:25] Warning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (function expandTensors)

下面都是在重复后面三个warning。

解决:把batchify_with_label函数return里的mask改成mask.bool()

  1. # print(bert_seq_tensor.type())
  2. return gazs, word_seq_tensor, biword_seq_tensor, word_seq_lengths, label_seq_tensor, layer_gaz_tensor, gaz_count_tensor,gaz_chars_tensor, gaz_mask_tensor, gazchar_mask_tensor, mask.bool(), bert_seq_tensor, bert_mask

这时还有一个warning,显示在gazlstm.py中

再修改get_tags函数:

  1. gaz_mask = gaz_mask_input.unsqueeze(-1).repeat(1,1,1,1,self.gaz_emb_dim)
  2. # 加一句
  3. gaz_mask = gaz_mask.bool()
  4. gaz_embeds = gaz_embeds_d.data.masked_fill_(gaz_mask.data, 0) #(b,l,4,g,ge) ge:gaz_embed_dim

ok,没有warning了,顺利运行!

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/知新_RL/article/detail/586371
推荐阅读
  

闽ICP备14008679号