当前位置:   article > 正文

词向量之加载word2vec和glove_cpu加载txt2vec

cpu加载txt2vec

1 Google用word2vec预训练了300维的新闻语料的词向量googlenews-vecctors-negative300.bin,解压后3.39个G。


可以用gensim加载进来,但是需要内存足够大。

  1. #加载Google训练的词向量
  2. import gensim
  3. model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin',binary=True)
  4. print(model['love'])


2 用Glove预训练的词向量也可以用gensim加载进来,只是在加载之前要多做一步操作,代码参考

Glove300维的词向量有5.25个G。

  1. # 用gensim打开glove词向量需要在向量的开头增加一行:所有的单词数 词向量的维度
  2. import gensim
  3. import os
  4. import shutil
  5. import hashlib
  6. from sys import platform
  7. #计算行数,就是单词数
  8. def getFileLineNums(filename):
  9. f = open(filename, 'r')
  10. count = 0
  11. for line in f:
  12. count += 1
  13. return count
  14. #Linux或者Windows下打开词向量文件,在开始增加一行
  15. def prepend_line(infile, outfile, line):
  16. with open(infile, 'r') as old:
  17. with open(outfile, 'w') as new:
  18. new.write(str(line) + "\n")
  19. shutil.copyfileobj(old, new)
  20. def prepend_slow(infile, outfile, line):
  21. with open(infile, 'r') as fin:
  22. with open(outfile, 'w') as fout:
  23. fout.write(line + "\n")
  24. for line in fin:
  25. fout.write(line)
  26. def load(filename):
  27. num_lines = getFileLineNums(filename)
  28. gensim_file = 'glove_model.txt'
  29. gensim_first_line = "{} {}".format(num_lines, 300)
  30. # Prepends the line.
  31. if platform == "linux" or platform == "linux2":
  32. prepend_line(filename, gensim_file, gensim_first_line)
  33. else:
  34. prepend_slow(filename, gensim_file, gensim_first_line)
  35. model = gensim.models.KeyedVectors.load_word2vec_format(gensim_file)
  36. load('glove.840B.300d.txt')
生成的glove_model.txt就是可以直接用gensim打开的模型。



声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/很楠不爱3/article/detail/351885
推荐阅读
相关标签
  

闽ICP备14008679号