当前位置:   article > 正文

object has no attribute ‘get_vocab’_erniemtokenizer' object has no attribute 'vocab

erniemtokenizer' object has no attribute 'vocab

题目

'''
Description: object has no attribute ‘get_vocab’
Autor: 365JHWZGo
Date: 2021-12-07 11:45:13
LastEditors: 365JHWZGo
LastEditTime: 2021-12-07 12:45:34
'''
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
'
运行

今天在写新闻分类时发现黑马的代码无法使用。

错误

1.没有text_classification

train_dataset, test_dataset = text_classification.DATASETS['AG_NEWS'](root=load_data_path)
  • 1

解决方法

train_dataset, test_dataset = torchtext.datasets.AG_NEWS(root=path,split=('train',"test"))
  • 1

2.没有get_vocab方法

解决方法

然后查看其申明
进入ag_news.py

def AG_NEWS(root, split):
    path = download_from_url(URL[split], root=root,
                             path=os.path.join(root, split + ".csv"),
                             hash_value=MD5[split],
                             hash_type='md5')
    return _RawTextIterableDataset(DATASET_NAME, NUM_LINES[split],
                                   _create_data_from_csv(path))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
'
运行

再进入_RawTextIterableDataset类,是在datasets_utils.py中

class _RawTextIterableDataset(torch.utils.data.IterableDataset):
    """Defines an abstraction for raw text iterable datasets.
    """

    def __init__(self, description, full_num_lines, iterator):
        """Initiate the dataset abstraction.
        """
        super(_RawTextIterableDataset, self).__init__()
        self.description = description
        self.full_num_lines = full_num_lines
        self._iterator = iterator
        self.num_lines = full_num_lines
        self.current_pos = None
        

    def __iter__(self):
        return self

    def __next__(self):
        if self.current_pos == self.num_lines - 1:
            raise StopIteration
        item = next(self._iterator)
        if self.current_pos is None:
            self.current_pos = 0
        else:
            self.current_pos += 1
        return item

    def __len__(self):
        return self.num_lines

    def pos(self):
        """
        Returns current position of the iterator. This returns None
        if the iterator hasn't been used yet.
        """
        return self.current_pos

    def __str__(self):
        return self.description    
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40

发现确实没有get_vocab函数,那就自己实现它的功能吧!
首先了解它的功能是统计train_datasets中的不同单词总数

def get_vocab(self):
        lengthAll = 0
        d = dict()
        for i in range(self.num_lines):
            sub_content = self.__next__()[1].lower()
            remove = str.maketrans("","",string.punctuation)
            sub_content = sub_content.translate(remove).split()
            for sub in sub_content:
                if sub not in d:
                    lengthAll+=1
                    d[sub]=1
                else:
                    continue
           # if i%1000==0:
           #     print(i)
        return lengthAll
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
'
运行

运行结果

输入测试代码

print(train_dataset.get_vocab())
  • 1

在这里插入图片描述

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/盐析白兔/article/detail/877021
推荐阅读
相关标签
  

闽ICP备14008679号