当前位置:   article > 正文

用Python统计英文书词频,排除停用词,用openpyxl写进Excel_词频统计 移除词表

词频统计 移除词表
  1. import re
  2. from openpyxl import Workbook
  3. import warnings
  4. warnings.filterwarnings('ignore')
  5. wb = Workbook()
  6. ws1 = wb.create_sheet('词频统计')
  7. ws1['A1'] = '排序'
  8. ws1['B1'] = '单词'
  9. ws1['C1'] = '词频'
  10. wb.save('./斯宾塞自传词频统计.xlsx')
  11. print("词频工作表创建好啦!")
  12. txt = open('spencer.txt',errors='ignore').read().lower()
  13. txt = re.sub(r'[^a-zA-Z]',' ', txt)
  14. words = txt.split()
  15. print("文本单词处理好啦!")
  16. stop_words = open('stop_words.txt').read()
  17. stop_words = re.sub(r'[^a-zA-Z]',' ', stop_words)
  18. stop_words = stop_words.split()
  19. print("停用词表处理好啦!")
  20. words = [x for x in words if x not in stop_words]
  21. print("已经从文本中排除停用词啦!")
  22. counts = {}
  23. for word in words:
  24. if word in counts:
  25. counts[word] = counts[word] + 1
  26. elif word not in counts:
声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop】
推荐阅读
相关标签
  

闽ICP备14008679号