当前位置:   article > 正文

红楼梦人物出场次数统计_word统计人名出现次数

word统计人名出现次数

 

这个也是学习过程中的一个成果吧,希望大家能批评指正。

红楼梦人物出场次数统计。亮点在于,考虑了人物的别称,以及有较为丰富的排除词库。如凤姐的称谓就有许多,凤辣子,凤姐,王熙凤等等,还有黛玉,有林黛玉,林妹妹,林丫头等等。很多人如果不把这些考虑在内,就容易导致错误的结果。

不得不说,宝玉是绝对的主角。人们通常认为黛玉是第二主角,宝黛恋是全文的绝对主线。结果出场次数最多的除了宝玉,却是贾母、凤姐和王夫人。接下来才是黛玉。挺吃惊的,不是吗?

标题

下面上代码:

这个也是学习过程中的一个成果吧,希望大家能批评指正。

  1. #CalDreamsV1.py
  2. import jieba
  3. txt = open("Dreams.txt", "r", encoding="utf-8").read()
  4. excludes = {"什么", "一个", "我们", "那里", "如今", "你们", "说道", "知道", "起来", "这里", \
  5. "出来","姑娘","他们","众人","奶奶","自己","一面","只见","两个", \
  6. "怎么","不是","不知","这个","听见","这样","进来","咱们","告诉","就是" ,\
  7. "东西","回来","大家","没有","只是","这样","进来","咱们","告诉","就是" }
  8. words = jieba.lcut(txt)
  9. counts = {}
  10. for word in words:
  11. if len(word) == 1:
  12. continue
  13. elif word == "老太太" or word == "太太" or word == "老祖宗" or word == "史太君":
  14. rword = "贾母"
  15. elif word == "老爷":
  16. rword = "贾政"
  17. elif word == "宝二爷":
  18. rword = "宝玉"
  19. elif word == "王熙凤" or word == "熙凤" or word == "凤辣子":
  20. rword = "凤姐"
  21. elif word == "林黛玉" or word == "潇湘妃子" or word == "林丫头" or word == "林妹妹":
  22. rword = "黛玉"
  23. elif word == "宝姑娘" or word == "宝丫头" or word == "蘅芜君" or word == "宝姐姐":
  24. rword = "宝钗"
  25. else:
  26. rword = word
  27. counts[rword] = counts.get(rword, 0) + 1
  28. for word in excludes :
  29. del counts[word]
  30. items = list(counts.items())
  31. items.sort(key = lambda x:x[1], reverse=True)
  32. print("《红楼梦》人物出场次数")
  33. for i in range(10):
  34. word , count = items[i]
  35. print("{0:<10}{1:>5}".format(word, count))

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/731478
推荐阅读
相关标签
  

闽ICP备14008679号