赞
踩
赞
踩
这几天实在是,有点困,,自己写的代码,可能有误!!
不过话说能运行就是王道。希望大佬们帮忙纠正纠正。作为菜鸡的我先行谢过!
- import pandas as pd
- import matplotlib.pyplot as plt
- import numpy as np
- plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
- plt.rcParams['axes.unicode_minus']=False #用来正常显示负号
-
-
- # 导入数据
- df = pd.read_csv(r'D:\钉钉杯\2022年首届钉钉杯大学生大数据挑战赛练习题目\练习题A\数据集\data.csv')
- df = df.drop(columns = 'Unnamed: 0')
-
- # 画图分析
-
- df['总价'] = df['总价'].map(lambda x: float((str(x).replace('万',''))))
- mean_df = df.groupby('区域')['总价'].agg([np.mean])
- x = list(mean_df.index)
- y = list(mean_df['mean'])
- plt.title('区域二手房均价')
- plt.bar(x,y)
-
- plt.show()
- count_df = df['区域'].value_counts()
- count_df = pd.DataFrame(count_df)
- count_df.columns = ['count']
- count_df['count'].plot(kind ='pie',title = '区域二手房数据占比',autopct = '%1.2f%%')
-
- plt.show()
- zx_count_df = pd.DataFrame(df['装修'].value_counts())
- zx_count_df.plot(kind = 'bar',title = '二手房装修程度分析',ylabel = '数量')
- plt.show()
- hx_df = df.groupby('户型')['总价'].agg([np.mean])
- hx_df2 =df['户型'].value_counts().head(5)
- hx = list(hx_df2.index)
- hx_mean=[]
- for i in hx:
- hx_mean.append(hx_df['mean'][i])
- np.array(hx_mean)
- hx_df = pd.DataFrame()
- hx_df.index=hx
- hx_df['均值'] = hx_mean
- hx_df.plot(kind = 'bar',title='热门户型均价')
- plt.show()
-
-
- #建立房间定价模型,这里采用的较为简单的GBDT模型
- # 模型有缺失值,需要对缺失值进行去除
- df = df.dropna(axis=0,how='any')
- # 采用GBDT模型进行回归预测分析 可以改进
- from sklearn.model_selection import train_test_split
- from sklearn.preprocessing import LabelEncoder
- from sklearn.ensemble import GradientBoostingRegressor
- le =LabelEncoder()
-
- # 对字符串数值化
- df['户型']=le.fit_transform(df['户型'])
- df['朝向']=le.fit_transform(df['朝向'])
- df['楼层']=le.fit_transform(df['楼层'])
- df['装修']=le.fit_transform(df['装修'])
- df['区域']=le.fit_transform(df['区域'])
-
- X = df.drop(columns=['小区名字','总价','建筑面积','单价'])
- y = df['总价']
-
- # 划分测试集和训练集
- X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=666)
-
- # 拟合数据
- model = GradientBoostingRegressor(random_state=666)
- model.fit(X_train,y_train)
- y_pred = model.predict(X_test)
- a = pd.DataFrame()
-
- a['预测值'] = list(y_pred)
- a['实际值'] = list(y_test)
-
- print(a)
-
- score = model.score(X_test,y_test)
- print('模型得分为:%s'%score)
out:
模型得分为:0.6562596295598755
在写完代码后发现一直报错,结果后面才发现是缺失值得原因,不过有点奇怪的是,用isnull检测也没有发现缺失值,但他就是有,可把我给气坏了,后来发现问题后,真的是想扎电脑。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。