基于上证金融数据的情感分析和走势预测代码+数据_pyspark进行股票情感预测

作者：菜鸟追梦旅行 | 2024-04-03 07:57:25

踩

pyspark进行股票情感预测

情感分析结果：

情感分析结果：

首先是获取股票评论数据的网站：

上证指数股吧_上证指数分析讨论社区-东方财富网

程序：


import requests, re, json, pandas as pd, time
from selenium import webdriver  # selenium2.48.0 支持phantomjs
from lxml import etree
from tqdm import tqdm
from snownlp import SnowNLP
import time
# pip install selenium==2.48.0   -i https://mirror.baidu.com/pypi/simple
# 表明我是谁？？
driver = webdriver.PhantomJS(executable_path=r'C:\Users\wang\Desktop\phantomjs-2.1.1-windows (1)\bin\phantomjs.exe')
# C:\Users\wang\Desktop\phantomjs-2.1.1-windows (1)\bin
data_text=[]
data_time=[]
with open("data_time_txt","a+",encoding="utf-8")as f:
    for  i in tqdm(range(1,2000)):
        url = "http://guba.eastmoney.com/list,zssh000001_"+str(i)+".html"
        driver.get(url=url)
        tree = etree.HTML(driver.page_source)
        data_wenbens =  tree.xpath('.//div[@class="articleh normal_post odd"]/span[@class="l3 a3"]/a/@title')
        print(data_wenbens)
        data_times =  tree.xpath('.//div[@class="articleh normal_post odd"]/span[@class="l5 a5"]/text()')
        print(data_times)
        if len(data_times)==len(data_wenbens) and len(data_times)>2:
            for j in range(len(data_times)):
                temp_text=data_wenbens[j]
                temp_time=str(data_times[j]).split(" ")[0]
                nlp = SnowNLP(str(temp_text))
                f.write(str(temp_text)+"__****__"+str(temp_time)+"__****__"+str(nlp.sentiments)+"\n")
        time.sleep(3)

数据展示：

写入txt 用__****__ 做分割，使用 SnowNLP 做开源的情感分析

$上证指数(SH000001)$天天就知道拉证券__****__06-10__****__0.7380620377395649
来来来，冲鸭，为了牛市，全仓梭哈__****__06-10__****__0.32144879323501274
该死的狐狸，我彻底踏空了，你心真坏，什么鱼尾诱多行情，纯粹，现在我也不敢__****__06-10__****__0.1948958915690776
昨天调整给你们机会你们把握不住，怪谁[大笑]__****__06-10__****__0.8689312767645354
接刀行情__****__06-10__****__0.890142453148024
无量反弹继续跌__****__06-10__****__0.1925025397374015
缩量反抽发套中。小心接飞刀，随时跳水。__****__06-10__****__0.1587406976493484
$上证指数(SH000001)$等你跳水__****__06-10__****__0.5348199281758914
为什么叫A股__****__06-10__****__0.31041609369117307
$上证指数(SH000001)$我是空狗，说实话，大盘长的心好慌__****__06-10__****__0.7589232220245196
$上证指数(SH000001)$散户真正亏夲牛市开启__****__06-10__****__0.7707121238079945
$上证指数(SH000001)$东南亚四国光伏豁免只是面子操作，取消中国关税才是__****__06-10__****__0.7173039202475434
【转载】碳达峰碳中和是国家中长期战略的重要组成部分__****__06-10__****__0.9995428587322593
比亚迪加油，争取创下万亿市值公司炒到一千倍市盈率历史记录，让世界看看我大A的霸气__****__06-10__****__0.9990492400663739
别等了，上车看风景[微笑]__****__06-10__****__0.9463208775361536
$上证指数(SH000001)$量化交易有力度__****__06-10__****__0.10875990306767358
最近大a真牛逼__****__06-10__****__0.5586189117390599
$上证指数(SH000001)$不要轻视两特大城市疫情对上市公司的影响，中报会很__****__06-10__****__0.31133644182461606
$上证指数(SH000001)$低开洗盘，己明确上攻趋势[大笑][大笑][大笑]__****__06-10__****__0.8614209827857494
诱多__****__06-10__****__0.875

根据每天的数据做股票的情感分析数据：


import  numpy as np
data_scorce=[[] for i in range(10)]
print(data_scorce)
data_time=[i for i in range(10) ]
with open("data_time_txt","r",encoding="utf-8")as f:
    f=f.readlines()
    for line in f:
        sorce=line.split("__****__")[-1]
        time=line.split("__****__")[1][-2:]
        if time=="10":
            data_scorce[int(time)-1].append(float(sorce))
        if time[0]=="0":
            data_scorce[int(time[-1])-1].append(float(sorce))
import matplotlib.pyplot as plt
import pandas as pd
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False
from pylab import *
score=[np.mean(i) for i in data_scorce]
x_low=[i for i in range(len(score))]
plt.xlabel('日期',fontsize=8)
plt.ylabel('情感指数',fontsize=8)
plt.plot(x_low,score,label='时间情感指数',color="r")
plt.title("上证指数股票评论—情感分析")
plt.show()

所有的代码数据：

股票评论数据情感分析（随时间变化）上证指数吧-自然语言处理文档类资源-CSDN下载

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/菜鸟追梦旅行/article/detail/354782?site

基于上证金融数据的情感分析和走势预测 代码+数据_pyspark进行股票情感预测

情感分析结果：

首先是获取 股票评论数据的网站：

程序：

数据展示：

基于上证金融数据的情感分析和走势预测代码+数据_pyspark进行股票情感预测

首先是获取股票评论数据的网站：