小蓝xlanll

这个屌丝很懒，什么也没留下！

热门标签

python简单词频统计_Python中简单的词频统计

作者：小蓝xlanll | 2024-03-28 20:57:05

踩

# 统计词频 word_count = {} # 创建字典 for i in set(new_words): # 用set去除li

用的是ipython notebook

1.框架是打开文件，写入文件

for line in open(in_file):

continue

out = open(out_file, 'w')

out.write()```

2.简单的统计词频大致模板

def count(in_file,out_file):

#读取文件并统计词频

word_count={}#统计词频的字典

for line in open(in_file):

words = line.strip().split(" ")

for word in words:

if word in word_count:

word_count[word]+=1

else:

word_count[word]=1

out = open(out_file,'w')#打开一个文件

for word in word_count:

print word,word_count[word]#输出字典的key值和value值

out.write(word+"--"+str(word_count[word])+"\n")#写入文件

out.close()

count(in_file,out_file)```

一段很长的英文文本，此代码都是用split(" ")空格区分一个单词，显然是不合格的比如： "I will endeavor," said he,那么"I 和he,等等会被看成一个词，此段代码就是告诉你基本的统计词频思路。看如下一道题

1.在网上摘录一段英文文本(尽量长一些)，粘贴到input.txt，统计其中每个单词的词频(出现的次数)，并按照词频的顺序写入out.txt文件，每一行的内容为“单词:频次”

用的模板

#统计词频，按词频顺序写入文件

in_file = 'input_word.txt'

out_file = 'output_word.txt'

def count_word(in_file,out_file):

word_count={}#统计词频的字典

for line in open(in_file):

words = line.strip().split(" ")

for word in words:

if word in word_count:

word_count[word]+=1

else:

word_count[word]=1

out = open(out_file,'w')

for word in sorted(word_count.keys()):#按词频的顺序遍历字典的每个元素

print word,word_count[word]

out.write('%s:%d' % (word, word_count.get(word)))

out.write('\n')

out.close()

count_word(in_file,out_file)```

正则表达式的方法

import re

f = open('input_word.txt')

words = {}

rc = re.compile('\w+')

for l in f:

w_l = rc.findall(l)

for w in w_l:

if words.has_key(w):

words[w] += 1

else:

words[w] = 1

f.close()

f = open('out.txt', 'w')

for k in sorted(words.keys()):

print k,words[k]

f.write('%s:%d' % (k, words.get(k)))

f.write('\n')

f.close()```

声明：本文内容由网友自发贡献，转载请注明出处：【wpsshop博客】