机器学习：基于python微博舆情分析系统+可视化+Django框架 K-means聚类算法（源码）✅_k-means算法舆情分析

作者：小小林熬夜学编程 | 2024-06-07 23:03:07

踩

k-means算法舆情分析

博主介绍：✌全网粉丝10W+,前互联网大厂软件研发、集结硕博英豪成立工作室。专注于计算机相关专业毕业设计项目实战6年之久，选择我们就是选择放心、选择安心毕业✌感兴趣的可以先收藏起来，点赞、关注不迷路✌

毕业设计：2023-2024年计算机毕业设计1000套（建议收藏）

毕业设计：2023-2024年最新最全计算机专业毕业设计选题汇总

1、项目介绍

技术栈：
Python语言+Django框架+数据库+jieba分词+ scikit_learn机器学习（K-means聚类算法）+情感分析 snownlp

2、项目界面

（1）微博舆情分析
在这里插入图片描述

（2）情感分析可视化

在这里插入图片描述

（3）微博数据浏览
在这里插入图片描述

（4）评论前十
在这里插入图片描述

（5）K-Means聚类分析
在这里插入图片描述

（6）K-Means聚类词云图

在这里插入图片描述
（7）后台数据管理

在这里插入图片描述
（8）注册登录界面

3、项目说明

1、所用技术
Python语言+Django框架+数据库+jieba分词+
scikit_learn机器学习（K-means聚类算法）+情感分析 snownlp

2、SnowNLP是一个常用的Python文本分析库，是受到TextBlob启发而发明的。由于当前自然语言处理库基本都是针对英文的，而中文没有空格分割特征词，Python做中文文本挖掘较难，后续开发了一些针对中文处理的库，例如SnowNLP、Jieba、BosonNLP等。注意SnowNLP处理的是unicode编码，所以使用时请自行decode成unicode。

4、核心代码



###首页
@check_login
def index(request):
    # 话题列表
    topic_raw = [item.topic for item in WeiBo.objects.all() if item.topic]
    topic_list = []
    for item in topic_raw:
        topic_list.extend(item.split(','))
    topic_list = list(set(topic_list))
    # yon用户信息
    uid = int(request.COOKIES.get('uid', -1))
    if uid != -1:
        username = User.objects.filter(id=uid)[0].name
    # 得到话题
    if 'key' not in request.GET:
        key = topic_list[0]
        raw_data = WeiBo.objects.all()
    else:
        key= request.GET.get('key')
        raw_data = WeiBo.objects.filter(topic__contains=key)
    # 分页
    if 'page' not in request.GET:
        page = 1
    else:
        page = int(request.GET.get('page'))
    data_list = raw_data[(page-1)*20 : page*20     ]
    return render(request, 'index.html', locals())
# 情感分类
def fenlei(request):
    from snownlp import SnowNLP
    # j = '我喜欢你'
    # s = SnowNLP(j)
    # print(s.sentiments)

    for item in tqdm(WeiBo.objects.all()):
        emotion = '正向' if SnowNLP(item.content).sentiments >0.45 else '负向'
        WeiBo.objects.filter(id=item.id).update(emotion=emotion)
    return JsonResponse({'status':1,'msg':'操作成功'} )



# 登录
def login(request):
    if request.method == "POST":
        tel, pwd = request.POST.get('tel'), request.POST.get('pwd')
        if User.objects.filter(tel=tel, password=pwd):

            obj = redirect('/')
            obj.set_cookie('uid', User.objects.filter(tel=tel, password=pwd)[0].id, max_age=60 * 60 * 24)
            return obj
        else:
            msg = "用户信息错误，请重新输入！！"
            return render(request, 'login.html', locals())
    else:
        return render(request, 'login.html', locals())

# 注册
def register(request):
    if request.method == "POST":
        name, tel, pwd = request.POST.get('name'), request.POST.get('tel'), request.POST.get('pwd')
        print(name, tel, pwd)
        if User.objects.filter(tel=tel):
            msg = "你已经有账号了，请登录"
        else:
            User.objects.create(name=name, tel=tel, password=pwd)
            msg = "注册成功，请登录！"
        return render(request, 'login.html', locals())
    else:
        msg = ""
        return render(request, 'register.html', locals())

# 注销
def logout(request):
    obj = redirect('index')
    obj.delete_cookie('uid')
    return obj

# 微博可视化
@check_login
def plot(request):
    """
    折线图   每月发表数
    柱状图   每日发表微博前20
    饼图  正负向
    柱状图  评论前十
    """
    uid = int(request.COOKIES.get('uid', -1))
    if uid != -1:
        username = User.objects.filter(id=uid)[0].name
    #1 折线图   每天发布微博折线图
    raw_data = WeiBo.objects.all()
    main1 = [item.time.strftime('%Y-%m-%d') for item in raw_data]
    main1_x = sorted(list(set(main1)))
    main1_y = [main1.count(item) for item in main1_x]



    #2 柱状图   发表微博前20 日期
    raw_data = WeiBo.objects.all()
    main2 = [item.time.strftime('%Y-%m-%d') for item in raw_data]
    main2set = sorted(list(set(main2)))
    main2_x = {item:main2.count(item)  for item in main2set}
    main2 = sorted(main2_x.items(),key=lambda x:x[1],reverse=True)[:20]
    print(main2)
    main2_x = [item[0] for item in main2]
    main2_y = [item[1] for item in main2]

    #3饼图
    main3 = [item.emotion+'情感' for item in raw_data]
    main3_y = {}
    for item in main3:
        main3_y[item] = main3_y.get(item,0) + 1
    main3 = [{
        'value':v,
        'name':k
    } for k,v in main3_y.items() ]



    #4柱状图
    raw_data = raw_data.order_by('-pinglun')[:10]
    main4_x = [f'id={itme.id}' for itme in raw_data]
    main4_y = [itme.pinglun for itme in raw_data]



    return render(request,'plot.html',locals())


####情感分类可视化
@check_login
def qingganPlot(request):
    """
    折线图   每月发表数
    柱状图   每日发表微博前20
    饼图  正负向
    柱状图  评论前十
    """
    uid = int(request.COOKIES.get('uid', -1))
    if uid != -1:
        username = User.objects.filter(id=uid)[0].name
    #1 折线图   每天发布微博折线图
    raw_data = WeiBo.objects.all()
    main1 = [item.time.strftime('%Y-%m-%d') for item in raw_data]
    main1_x = sorted(list(set(main1)))
    main1_y1 = []
    for item in main1_x:
        year = int(item.split('-')[0])
        month = int(item.split('-')[1])
        day = int(item.split('-')[2])
        main1_y1.append(raw_data.filter(emotion='正向',time__year=year,time__month=month,time__day=day).count())

    main1_y2 = []
    for item in main1_x:
        year = int(item.split('-')[0])
        month = int(item.split('-')[1])
        day = int(item.split('-')[2])
        main1_y2.append(raw_data.filter(emotion='负向', time__year=year, time__month=month, time__day=day).count())

    main1_data = ['正向','负向']
    main1_y = [
        {
            'name': '正向',
            'type': 'line',
            'data': main1_y1
        },
        {
            'name': '负向',
            'type': 'line',
            'data': main1_y2
        },

    ]



    #2 柱状图   发表微博前20riqi 日期
    stop = [item.strip()      for item in open(os.path.join('stopwords','hit_stopwords.txt')  , 'r',encoding='UTF-8').readlines()]
    stop.extend([item.strip() for item in open(os.path.join('stopwords','scu_stopwords.txt'  ), 'r',encoding='UTF-8').readlines()])
    stop.extend([item.strip() for item in open(os.path.join('stopwords','baidu_stopwords.txt'), 'r',encoding='UTF-8').readlines()])
    stop.extend([item.strip() for item in open(os.path.join('stopwords','cn_stopwords.txt'   ), 'r',encoding='UTF-8').readlines()])

    main5_data = WeiBo.objects.filter(emotion='正向')[:1000]
    main5_json = {}
    for item in main5_data:
        text1 = list(jieba.cut(item.content.replace('#','').replace('O','').replace('L','').replace('.','')))
        for t in text1:
            if t in stop or t.strip() == '':
                continue
            if t not in main5_json.keys():
                main5_json[t] = 1
            else:
                main5_json[t] += 1
    result_dict = sorted(main5_json.items(), key=lambda x: x[1], reverse=True)[:20]  # 最大到最小
    main2_x = [item[0]  for item in result_dict]
    main2_y = [item[1]  for item in result_dict]

    #3饼图
    main3 = [item.emotion+'情感' for item in raw_data]
    main3_y = {}
    for item in main3:
        main3_y[item] = main3_y.get(item,0) + 1
    main3 = [{
        'value':v,
        'name':k
    } for k,v in main3_y.items() ]




    ## 5
    stop = [item.strip() for item in open(os.path.join('stopwords', 'hit_stopwords.txt'), 'r',encoding='UTF-8').readlines()]
    stop.extend([item.strip() for item in open(os.path.join('stopwords', 'scu_stopwords.txt'), 'r',encoding='UTF-8').readlines()])
    stop.extend([item.strip() for item in open(os.path.join('stopwords', 'baidu_stopwords.txt'), 'r',encoding='UTF-8').readlines()])
    stop.extend([item.strip() for item in open(os.path.join('stopwords', 'cn_stopwords.txt'), 'r',encoding='UTF-8').readlines()])

    main5_data = WeiBo.objects.filter(emotion='正向')[:1000]
    main5_json = {}
    for item in main5_data:
        text1 = list(jieba.cut(item.content))
        for t in text1:
            if t in stop or t.strip() == '':
                continue
            if t not in main5_json.keys():
                main5_json[t] = 1
            else:
                main5_json[t] += 1
    result_dict = sorted(main5_json.items(), key=lambda x: x[1], reverse=True)[:30]  # 最大到最小
    # print(result_dict)
    main5_data = [{
        "name": item[0],
        "value": item[1]
    } for item in result_dict]
    # 6
    main6_data = WeiBo.objects.filter(emotion='负向')[:1000]
    main6_json = {}
    for item in main6_data:
        text1 = list(jieba.cut(item.content))
        for t in text1:
            if t in stop or t.strip() == '':
                continue
            if t not in main6_json.keys():
                main6_json[t] = 1
            else:
                main6_json[t] += 1
    result_dict = sorted(main6_json.items(), key=lambda x: x[1], reverse=True)[:30]  # 最大到最小
    # print(result_dict)
    main6_data = [{
        "name": item[0],
        "value": item[1]
    } for item in result_dict]
    ########7话题词云图
    topic_raw = [item.topic for item in WeiBo.objects.all() if item.topic]
    topic_list = []
    for item in topic_raw:
        topic_list.extend(item.split(','))
    topic_set = list(set(topic_list))
    main7_data = [{
        "name": item,
        "value": topic_list.count(item)
    } for item in topic_set]

    main7_data = sorted(main7_data,key=lambda  x:x['value'],reverse=True)[:10]
    return render(request,'qingganPlot.html',locals())

# 个人中心
@check_login
def my(request):
    uid = int(request.COOKIES.get('uid', -1))
    if uid != -1:
        username = User.objects.filter(id=uid)[0].name
    if request.method == "POST":
        name,tel,password = request.POST.get('name'),request.POST.get('tel'),request.POST.get('password1')
        User.objects.filter(id=uid).update(name=name,tel=tel,password=password)
        return redirect('/')
    else:
        my_info = User.objects.filter(id=uid)[0]
        return render(request,'my.html',locals())


# 清洗文本
def clearTxt(line:str):
    if(line != ''):
        line = line.strip()
        # 去除文本中的英文和数字
        line = re.sub("[a-zA-Z0-9]", "", line)
        # 去除文本中的中文符号和英文符号
        line = re.sub("[\s+\.\!\/_,$%^*(+\"\'；：“”．]+|[+——！，。？?、~@#￥%……&*（）]+", "", line)
        return line
    return None

#文本切割
def sent2word(line):
    segList = jieba.cut(line,cut_all=False)
    segSentence = ''
    for word in segList:
        if word != '\t':
            segSentence += word + " "
    return segSentence.strip()
def  kmeansPlot(request):
    uid = int(request.COOKIES.get('uid', -1))
    if uid != -1:
        username = User.objects.filter(id=uid)[0].name

    # 聚类个数
    if 'num' in request.GET:
        num = int(request.GET.get('num'))
    else:
        num = 2
    ### 训练
    # 清洗文本
    clean_data = [item.content for item in WeiBo.objects.all()]
    clean_data = [clearTxt(item) for item in clean_data]
    clean_data = [sent2word(item) for item in clean_data]

    # 该类会将文本中的词语转换为词频矩阵，矩阵元素a[i][j] 表示j词在i类文本下的词频
    vectorizer = CountVectorizer(max_features=20000)
    # 该类会统计每个词语的tf-idf权值
    tf_idf_transformer = TfidfTransformer()
    # 将文本转为词频矩阵并计算tf-idf
    tfidf = tf_idf_transformer.fit_transform(vectorizer.fit_transform(clean_data))
    # 获取词袋模型中的所有词语
    tfidf_matrix = tfidf.toarray()
    # 获取词袋模型中的所有词语
    word = vectorizer.get_feature_names()

    # 聚成5类
    from sklearn.cluster import KMeans
    clf = KMeans(n_clusters=num)
    result_list = clf.fit(tfidf_matrix)
    result_list  = list(clf.predict(tfidf_matrix))


    #####k可视化处理
    ## 1 饼图
    """{
            value: 735,
            name: 'Direct'
        }
    """
    pie_data = [
        {
            'value': result_list.count(i),
            'name': f'第{i+1}类'
        }
        for i in range(num)
    ]
    print(pie_data)


    div_id_list = [f'container{i+1}' for i in range(num)]


    data_list = []
    for label,name in enumerate(div_id_list):
        tmp = {'id':name,'data':[],'title':f'第{label+1}类'}
        # 汇总
        tmp_text_list = ''
        for la,text in zip(result_list,clean_data):
            if la == label:
                tmp_text_list += ' ' + text
        tmp_text_list = [item for item in tmp_text_list.split(' ') if item.strip() != ' ']

        # 得到前30
        rank_Data = [
            {
                'value': tmp_text_list.count(item),
                'name': item
            }
            for item in set(tmp_text_list)
        ]
        rank_Data = sorted(rank_Data,key=lambda  x: x['value'],reverse=True)[:100]


        tmp['data'] =  rank_Data
        data_list.append(tmp)

    return render(request, 'kmeansPlot.html', locals())


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/小小林熬夜学编程/article/detail/687398

机器学习：基于python微博舆情分析系统+可视化+Django框架 K-means聚类算法（源码）✅_k-means算法 舆情分析

1、项目介绍

2、项目界面

3、项目说明

4、核心代码

机器学习：基于python微博舆情分析系统+可视化+Django框架 K-means聚类算法（源码）✅_k-means算法舆情分析