当前位置:   article > 正文

基于nltk的自然语言处理---stopwords停用词处理_nltk stopwords

nltk stopwords

一个nltk库的自然语言处理stopwords停用词的测试脚本,先对一段字符串进行测试:

import pandas as pd
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

from nltk.corpus import stopwords
nltk.download('stopwords')
nltk.download('punkt')


#example_sent= pd.read_csv('D:/set/PubMed/1813/1-grams-1813.tsv')
example_sent = "the respecting  Spasmodic SecondaryVenereal off from Fluids	portions partly Nerve Example	some Natives  Metacarpal Contracted Constitutions	Instance jat by severe double Appendix contained Joints Disorders <BOS> Tumour Vascular Tongue Bone case Liver Account Diseases History Explanation A <EOS> Soldiery Human Brain betweenHumor operation , cyst Tabular Radial attended situated Inflammation Puberty attached sawing evacuating Dissection DiseaseMouth Groin Some Bones cases circumstances posterior Cataract	intoStrangulatedAqueous Observations . was to which Aneurism Paralysis	beneficial	Eyes Opium Ossium Effects Hemorrhage Appearance succeeded On a with Synopsis Fon in successfully"
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(example_sent)
filtered_sentence = [w for w in word_tokens if not w in stop_words]
filtered_sentence = []
for w in word_tokens:
     if w 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/码创造者/article/detail/972846
推荐阅读
  

闽ICP备14008679号