赞
踩
小编典典
要下载特定的数据集/模型,请使用nltk.download()函数,例如,如果你要下载punkt句子标记器,请使用:
$ python3
>>> import nltk
>>> nltk.download('punkt')
如果不确定所需的数据/模型,则可以使用以下数据和模型的基本列表开始:
>>> import nltk
>>> nltk.download('popular')
它将下载“流行”资源的列表,其中包括:
已编辑
如果有人避免nltk从https://stackoverflow.com/a/38135306/610569上从下载较大的数据集而避免错误
$ rm /Users//nltk_data/corpora/panlex_lite.zip
$ rm -r /Users//nltk_data/corpora/panlex_lite
$ python
>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')
更新
从v3.2.5起,当nltk_data找不到资源时,NLTK会提供更多信息,例如:
>>> from nltk import word_tokenize
>>> word_tokenize('x')
Traceback (most recent call last):
File "", line 1, in
File "/Users/l/alvas/git/nltk/nltk/tokenize/__init__.py", line 128, in word_tokenize
sentences = [text] if preserve_line else sent_tokenize(text, language)
File "/Users//alvas/git/nltk/nltk/tokenize/__init__.py", line 94, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "/Users/alvas/git/nltk/nltk/data.py", line 820, in load
opened_resource = _open(resource_url)
File "/Users/alvas/git/nltk/nltk/data.py", line 938, in _open
return find(path_, path + ['']).open()
File "/Users/alvas/git/nltk/nltk/data.py", line 659, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
Searched in:
- '/Users/alvas/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
2020-02-21
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。