当前位置:   article > 正文

nltk安装与使用_nltk的stopwords

nltk的stopwords

Natural Language Toolkit,自然语言处理工具包,在NLP领域中,最常使用的一个Python库。

1、安装nltk

pip install -upgrade nltk
  • 1

2、安装nltk_data

import nltk
nltk.download('punkt')  # 英文且此、词根、切句等方法
nltk.download('stopwords')  # 英文停用词库
  • 1
  • 2
  • 3

我是用上面python代码下载相关数据集,一直报错

[nltk_data] Error loading punkt: <urlopen error [Errno 8] nodename nor
[nltk_data]     servname provided, or not known>
[nltk_data] Error loading stopwords: <urlopen error [Errno 8] nodename
[nltk_data]     nor servname provided, or not known>
  • 1
  • 2
  • 3
  • 4

最后去github手动下载,下载packages中的所有内容
在这里插入图片描述
下载后放到本地文件夹,我放在了/Users/sunwenjun/anaconda3/envs/python310/nltk_data/注意有些压缩包要解压

from nltk.data import find
print(find('punkt')) # /Users/sunwenjun/anaconda3/envs/python310/nltk_data/punkt
print(find('tokenizers')) # /Users/sunwenjun/anaconda3/envs/python310/nltk_data/tokenizers
  • 1
  • 2
  • 3

在这里插入图片描述

3、nltk使用

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

input_string = 'Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks'

# 分词
word_tokens = word_tokenize(input_string)
print(word_tokens)  # ['Retrieval-Augmented', 'Generation', 'for', 'Knowledge-Intensive', 'NLP', 'Tasks']

# 去停用词
stop_words = set(stopwords.words('english'))
filtered_words = [w for w in word_tokens if not w.lower() in stop_words]
print(filtered_words)  # ['Retrieval-Augmented', 'Generation', 'Knowledge-Intensive', 'NLP', 'Tasks']

# 取词根
ps = PorterStemmer()
ps_words = [ps.stem(w) for w in filtered_words]
print(ps_words)  # ['retrieval-aug', 'gener', 'knowledge-intens', 'nlp', 'task']
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

4、nltk_data可存放的路径

LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load corpora/punkt

  Searched in:
    - '/Users/sunwenjun/nltk_data'
    - '/Users/sunwenjun/anaconda3/envs/python310/nltk_data'
    - '/Users/sunwenjun/anaconda3/envs/python310/share/nltk_data'
    - '/Users/sunwenjun/anaconda3/envs/python310/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号