当前位置:   article > 正文

pytorch从glove词向量源文件中生成embedding并载入_glove pretrained pytorch

glove pretrained pytorch

首先是下载glove文件

格式为txt,每一行开头是单词,后面是100个float类型数,空格隔开,因此我们载入这个文件,并取出每一行

  1. def get_numpy_word_embed(word2ix):
  2. row = 0
  3. file = 'zhs_wiki_glove.vectors.100d.txt'
  4. path = '/home/socialbird/platform/aion-autonlp/Downloads'
  5. whole = os.path.join(path, file)
  6. words_embed = {}
  7. with open(whole, mode='r')as f:
  8. lines = f.readlines()
  9. for line in lines:
  10. # print(line)
  11. # print(len(line.split()))
  12. line_list = line.split()
  13. word = line_list[0]
  14. embed = line_list[1:]
  15. embed = [float(num) for num in embed]
  16. words_embed[word] = embed
  17. if row > 20000:
  18. break
  19. row += 1
  20. # word2ix = {}
  21. ix2word = {ix: w for w, ix in word2ix.items()}
  22. id2emb = {}
  23. for ix in range(l
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/码创造者/article/detail/822289
推荐阅读
相关标签
  

闽ICP备14008679号