赞
踩
一个nltk库的自然语言处理stopwords停用词的测试脚本,先对一段字符串进行测试:
import pandas as pd import nltk from nltk.corpus import stopwords from nltk.tokenize import word_tokenize from nltk.corpus import stopwords nltk.download('stopwords') nltk.download('punkt') #example_sent= pd.read_csv('D:/set/PubMed/1813/1-grams-1813.tsv') example_sent = "the respecting Spasmodic SecondaryVenereal off from Fluids portions partly Nerve Example some Natives Metacarpal Contracted Constitutions Instance jat by severe double Appendix contained Joints Disorders <BOS> Tumour Vascular Tongue Bone case Liver Account Diseases History Explanation A <EOS> Soldiery Human Brain betweenHumor operation , cyst Tabular Radial attended situated Inflammation Puberty attached sawing evacuating Dissection DiseaseMouth Groin Some Bones cases circumstances posterior Cataract intoStrangulatedAqueous Observations . was to which Aneurism Paralysis beneficial Eyes Opium Ossium Effects Hemorrhage Appearance succeeded On a with Synopsis Fon in successfully" stop_words = set(stopwords.words('english')) word_tokens = word_tokenize(example_sent) filtered_sentence = [w for w in word_tokens if not w in stop_words] filtered_sentence = [] for w in word_tokens: if w
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。