赞
踩
作业需求:
分析两本类型不同的现代小说的词性分布,如武侠或侦探小说等.用一个类读入整本小说。用自然语言处理工具。初始化过程分析内容。分解词语并获取词性(如动词.形容词等).类对象取索引返回词和词性两项主要信息在调用类对象的函数中,实现词性的统计。用饼状图可视化个主要词性频率,对比两本小说的饼状
编辑
全部代码:
- import jieba
- import jieba.posseg
- import matplotlib.pyplot as plt
- import matplotlib
- from pylab import mpl
-
- mpl.rcParams['font.sans-serif'] = ['SimHei'] # 字体更改
- matplotlib.rcParams.update({'font.size': 15}) # 字体大小
- fig = plt.figure(figsize=(8, 8), dpi=80)
-
- word_type = ["a", "d", "n", "p", "r", "u", "v", "y"]
- word_type_chin = ["形容词", "副词", "名词", "介词", "代词", "助词", "动词", "语气词"]
-
-
- class Text():
- def init(self):
-
- with open("yitian.txt", mode="r", encoding="utf8") as txt1:
- a = txt1.read()
- with open("baiyexing.txt", mode="r", encoding="gbk") as txt2:
- b = txt2.read()
- self.txt = [a, b]
-
- self.output = [[], []]
- self.flag = [[], []]
- self.word = [[], []]
- self.identify(self)
- return self.output[0], self.output[1]
-
- def identify(self):
- for x in range(0, 2):
- self.txt[x] = jieba.posseg.cut(self.txt[x])
- for text in self.txt[x]:
- self.output[x].append([text.word, text.flag])
- self.flag[x].append(text.flag)
- for t in range(0, 8):
- print(f"{word_type_chin[t]}: {self.flag[x].count(word_type[t])}")
- self.word[x].append(self.flag[x].count(word_type[t]))
-
- def pie(self):
- for x in range(0, 2):
- fig.add_subplot(1, 2, x + 1)
- plt.pie(self.word[x],
- labels=word_type_chin, # 设置饼图标签
- # radius=1.2,
- autopct="%d%%",
- )
- if x == 0:
- text_type = "武侠"
- elif x == 1:
- text_type = "侦探"
-
- plt.title(f"{text_type}小说的词性分布")
-
- fig.show()
-
-
- text_1, text_2 = Text.init(Text)
- print(f"武侠小说:\n{text_1}")
- print(f"\n侦探小说:\n{text_2}")
-
- Text.pie(Text)
-
- input(" >>> ENTER以继续 <<< ")
编辑
数据+代码:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。