赞
踩
您只读取文件的第一行:line=myfile.readline()。您需要迭代文件中的每一行。一种方法是with open(fileName,'r') as myfile:
for line in myfile:
# the rest of your code here, i.e.:
stop_words=set(stopwords.words("english"))
words=word_tokenize(line)
还有,你有这个循环
^{pr2}$
但是您会注意到,在最外层的循环中定义的w从未在循环内部使用过。你应该可以删除这个,只写for n in words:
if n not in stop_words:
filtered_sentence.append(' '+n)
编辑:import nltk
import csv
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
def stop_Words(fileName,fileName_out):
file_out=open(fileName_out,'w')
with open(fileName,'r') as myfile:
for line in myfile:
stop_words=set(stopwords.words("english"))
words=word_tokenize(line)
filtered_sentence=[""]
for n in words:
if n not in stop_words:
filtered_sentence.append(""+n)
file_out.writelines(filtered_sentence+["\n"])
print "All Done SW"
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。