NLP算法-情绪分析-snowNLP算法库

作者：weixin_40725706 | 2024-04-06 11:39:02

踩

snownlp算法

snowNLP算法库

引入
SnowNLP简介
snowNLP功能
demo
- - 代码实现

引入

上回讲到了Jieba和Gensim这两个算法库，都是可以很方便的处理中文文本内容；
今天我们来说道说道SnowNLP这个库

SnowNLP简介

SnowNLP是一个python写的类库，可以方便的处理中文文本内容。
SnowNLP的技术框架参考了英语自然语言处理工具库TextBlob，不过SnowNLP不引用NLTK库，所有的算法都是isnowfy大神实现的。

snowNLP功能

SnowNLP分词

中文分词是将一段话分解成若干词语，这并不是一件很容易的事。

例如，在句子“湖南省会是长沙”中，“湖南省会”部分由“湖南”和“省会”两个词语组成，而在“湖南省会不断发展”中，“湖南省会”部分由“湖南省”和“会”组成。

snownlp库能达到较好的分词效果，如下示例：

from snownlp import SnowNLP
s1 = SnowNLP('湖南省会是长沙')
print(s1.words)    # ['湖南', '省会', '是', '长沙']
s2 = SnowNLP('湖南省会不断发展')
print(s2.words)    # ['湖南省', '会', '不断', '发展']
1
2
3
4
5

输出：
[‘湖南’, ‘省会’, ‘是’, ‘长沙’]
[‘湖南省’, ‘会’, ‘不断’, ‘发展’]

snowNLP标注拼音

snownlp还可以为每个字标注拼音，示例：

from snownlp import SnowNLP
s = SnowNLP('湖南省会是长沙')
print(s.pinyin)
1
2
3

输出：
[‘hu’, ‘nan’, ‘sheng’, ‘hui’, ‘shi’, ‘chang’, ‘sha’]

snowNLP提取关键字

snownlp可以提取文本中的关键字以及选择一句话作为文本的摘要，示例：

from snownlp import SnowNLP
text = '国防科技大学是高素质新型军事人才培养和国防科技自主创新高地。' 
       '要紧跟世界军事科技发展潮流，适应打赢信息化局部战争要求，' 
       '抓好通用专业人才和联合作战保障人才培养，加强核心关键技术攻关，' 
       '努力建设世界一流高等教育院校。'
s = SnowNLP(text)
print(s.keywords(3))
print(s.summary(1))
1
2
3
4
5
6
7
8

输出：
[‘科技’, ‘人才’, ‘军事’]
[‘国防科技大学是高素质新型军事人才培养和国防科技自主创新高地’]

snowNLP情感分析

snownlp还能对中文文本进行情感分析。

情感分析的结果是一个0~1之间的数字，数字越大表示这句话越偏向于肯定的态度，数字越小表示越偏向于否定的态度。

from snownlp import SnowNLP
s1 = SnowNLP('不错不错，楼主真棒')
s2 = SnowNLP('不知道你到底想说什么')
print(s1.sentiments)
print(s2.sentiments)
1
2
3
4
5

输出：
0.9719003655581226
0.2321400466499438

demo

提取HTML文件中的所有回复，分析帖子的好评度。

测试文件：

<!DOCTYPE html>
<html lang="en">
<body id="activity-detail" class="zh_CN mm_appmsg">
<p class="top">评论区域:</p>
        <div class="text">
            <a target="_blank">用户1</a>：赞！
</div>
<div class="text">
            <a target="_blank">用户2</a>：不错不错，楼主真棒！
</div>
<div class="text">
            <a target="_blank">用户3</a>：还可以。
</div>
<div class="text">
            <a target="_blank">用户4</a>：楼主加油。
</div>
<div class="text">
            <a target="_blank">用户5</a>：向楼主学习。
</div>
<div class="text">
            <a target="_blank">用户6</a>：感谢楼主。
</div>
<div class="text">
            <a target="_blank">用户7</a>：这个帖子真心很赞，推荐大家都看看。
</div>
<div class="text">
            <a target="_blank">用户8</a>：受益颇多。
</div>
<div class="text">
            <a target="_blank">用户9</a>：赞赞赞赞赞赞赞赞。
</div>
</body>
</html>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

代码实现

import re
from pyquery import PyQuery
from snownlp import SnowNLP

def evaluate(path):
    html = open(path,'r',encoding="utf-8")
    score, count = 0, 0
    # 任务：提取HTML文件中的所有回复，分析帖子的好评度。
    code = html.read()
    pq = PyQuery(code)
    tag = pq('div.text')
    taga = pq('a')
    words = ""
    for word in tag.text():
        if word not in taga.text():
            words += word
    
    txtlist = words.split('：')
    txtlist.pop(0)

    for txt in txtlist:
        count += 1
        s = SnowNLP(txt)
        score += s.sentiments
        
    return  int(score*100/count)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/weixin_40725706/article/detail/371672