赞
踩
Natural Language Toolkit,自然语言处理工具包,在NLP领域中,最常使用的一个Python库。
pip install -upgrade nltk
import nltk
nltk.download('punkt') # 英文且此、词根、切句等方法
nltk.download('stopwords') # 英文停用词库
我是用上面python代码下载相关数据集,一直报错
[nltk_data] Error loading punkt: <urlopen error [Errno 8] nodename nor
[nltk_data] servname provided, or not known>
[nltk_data] Error loading stopwords: <urlopen error [Errno 8] nodename
[nltk_data] nor servname provided, or not known>
最后去github手动下载,下载packages中的所有内容
下载后放到本地文件夹,我放在了/Users/sunwenjun/anaconda3/envs/python310/nltk_data/
,注意有些压缩包要解压。
from nltk.data import find
print(find('punkt')) # /Users/sunwenjun/anaconda3/envs/python310/nltk_data/punkt
print(find('tokenizers')) # /Users/sunwenjun/anaconda3/envs/python310/nltk_data/tokenizers
from nltk.tokenize import word_tokenize from nltk.corpus import stopwords input_string = 'Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks' # 分词 word_tokens = word_tokenize(input_string) print(word_tokens) # ['Retrieval-Augmented', 'Generation', 'for', 'Knowledge-Intensive', 'NLP', 'Tasks'] # 去停用词 stop_words = set(stopwords.words('english')) filtered_words = [w for w in word_tokens if not w.lower() in stop_words] print(filtered_words) # ['Retrieval-Augmented', 'Generation', 'Knowledge-Intensive', 'NLP', 'Tasks'] # 取词根 ps = PorterStemmer() ps_words = [ps.stem(w) for w in filtered_words] print(ps_words) # ['retrieval-aug', 'gener', 'knowledge-intens', 'nlp', 'task']
LookupError: ********************************************************************** Resource punkt not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt') For more information see: https://www.nltk.org/data.html Attempted to load corpora/punkt Searched in: - '/Users/sunwenjun/nltk_data' - '/Users/sunwenjun/anaconda3/envs/python310/nltk_data' - '/Users/sunwenjun/anaconda3/envs/python310/share/nltk_data' - '/Users/sunwenjun/anaconda3/envs/python310/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' **********************************************************************
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。