当前位置:   article > 正文

torchtext的Vocab从0.4.1版本转换到0.11版本_torch from torchtext.vocab import vocab

torch from torchtext.vocab import vocab

torchtext的Vocab类不能向前兼容。我有个0.4.1版本创建的model,就导入不到0.11版本里来。

下面贴上我的代码,进行转换。

  1. #!/usr/bin/env python3
  2. # -*- coding: utf-8 -*-
  3. """
  4. Created on Fri Feb 11 15:36:11 2022
  5. @author: Yihang Zhou
  6. Contact: yihangjoe@foxmail.com
  7. https://github.com/Y-H-Joe/
  8. ####============================ description ==============================####
  9. the Vocab in opennmt was trained based on pytorch 0.4.1 (torchtext 0.6.0). Now I want to use the trained
  10. model in pytorch 1.10.2 (torchtext_0.11.2). After researching the original code
  11. from two versions, this script is to finish the job.
  12. =================================== input =====================================
  13. =================================== output ====================================
  14. ================================= parameters ==================================
  15. =================================== example ===================================
  16. =================================== warning ===================================
  17. ####=======================================================================####
  18. """
  19. import torch ## 1.10.2
  20. ## the checkpoint model was trained in 0.4.1
  21. pretrained_smi_dp = 'trained_models/STEREO_separated_augm_model_average_20.pt'
  22. pretrained_smi_state_dict = torch.load(pretrained_smi_dp)
  23. """
  24. extract from 0.6.0 torchtext
  25. class Vocab(object):
  26. Defines a vocabulary object that will be used to numericalize a field.
  27. Attributes:
  28. freqs: A collections.Counter object holding the frequencies of tokens
  29. in the data used to build the Vocab.
  30. stoi: A collections.defaultdict instance mapping token strings to
  31. numerical identifiers.
  32. itos: A list of token strings indexed by their numerical identifiers.
  33. """
  34. v_old = pretrained_smi_state_dict['vocab'][0][1].itos ## ajustify based on your model
  35. """
  36. how to create a vocab class from 0.11.2 torchtext:
  37. >>> from torchtext.vocab import vocab
  38. >>> from collections import Counter, OrderedDict
  39. >>> counter = Counter(["a", "a", "b", "b", "b"])
  40. >>> sorted_by_freq_tuples = sorted(counter.items(), key=lambda x: x[1], reverse=True)
  41. >>> ordered_dict = OrderedDict(sorted_by_freq_tuples)
  42. >>> v1 = vocab(ordered_dict)
  43. and:
  44. in [96]: ordered_dict
  45. Out[96]: OrderedDict([('b', 3), ('a', 2)])
  46. so just need to construct a new ordered_dict from v_old
  47. """
  48. from torchtext.vocab import vocab # 0.11.2
  49. from collections import OrderedDict
  50. #new_ordered_dict = [(value, len(v_old)-int(index)) for index,value in enumerate(v_old)]
  51. new_sorted_by_freq_tuples = [(value, len(v_old)-int(index)) for index,value in enumerate(v_old)]
  52. new_ordered_dict = OrderedDict(new_sorted_by_freq_tuples)
  53. v_new = vocab(new_ordered_dict)
  54. """
  55. check:
  56. In [104]: v_new.get_itos() == v_old
  57. Out[104]: True
  58. """

基本思路就是通过dict,提取老版本里的itos/stoi,转换到新版本里去。手撕源码。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小丑西瓜9/article/detail/378276
推荐阅读
相关标签
  

闽ICP备14008679号