当前位置:   article > 正文

数据挖掘——梯度下降算法解决糖尿病问题

数据挖掘——梯度下降算法解决糖尿病问题

一、问题描述

        实现线性回归的梯度下降算法,解决糖尿病预测问题,输出mse和的值

二、实验目的

        熟练的掌握线性回归的梯度下降算法应用

三、实验内容

  1. 导入数据
  1. from sklearn.datasets import load_diabetes
  2. X, y = load_diabetes(return_X_y = True)     #获取数据

2、对数据进行训练,标准化处理

  1. y = y.reshape((442, 1))                                                                    
  2. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=5)
  3. X_train = process_features(X_train)          #矩阵的标准化
  4. X_test = process_features(X_test)            #矩阵的标准化
  5. model = LinearRegression()
  6. model.fit(X_train, y_train, eta=0.001, epsilon=0.0001)
3、预测数据
y_pred = model.predict(X_test)
4、求均方误差和R2
  1. mse = mean_squared_error(y_test, y_pred)
  2. score = r2_score(y_test, y_pred)
  3. print("mse={} andr2={}".format(mse,score))

5、图形化显示

  1. #图像化显示函数
  2. def printLine(x_name, y_name, title, X, Y): #x轴,y轴,标题,矩阵
  3. plt.figure(1)
  4. plt.plot(X[:,0], X[:,1], 'bo', ms=3) #蓝色散点
  5. plt.plot(X[:,0], X[:,1], 'b', ms=3, label='line1') #蓝色实线
  6. plt.plot(Y[:,0], Y[:,1], 'ro', ms=3) #蓝色散点
  7. plt.plot(Y[:,0], Y[:,1], 'r', ms=3, label='line2') #蓝色实线
  8. plt.xlabel(x_name, fontproperties = font) #步骤三
  9. plt.ylabel(y_name, fontproperties = font)
  10. plt.title(title, fontproperties = font)
  11. plt.show()
  12. return 0

四、实验结果及分析

1. 糖尿病数据

2. 运行结果

五、完整代码

机器学习GitHub:https://github.com/wanglei18/machine_learning

  1. import numpy as np
  2. class LinearRegression:
  3. def fit(self, X, y, eta, epsilon):
  4. m, n = X.shape
  5. w = np.zeros((n, 1))
  6. while True:
  7. e = X.dot(w) - y
  8. g = 2 * X.T.dot(e) / m # 梯度g
  9. w = w - eta * g
  10. if np.linalg.norm(g, 2) < epsilon:
  11. break
  12. self.w = w
  13. #进行预测
  14. def predict(self, X):
  15. return X.dot(self.w)
  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. from machine_learning.homework.libs.grade import LinearRegression
  4. from sklearn.datasets import load_diabetes
  5. from sklearn.preprocessing import StandardScaler
  6. from sklearn.model_selection import train_test_split
  7. from matplotlib.font_manager import FontProperties #步骤一
  8. font = FontProperties(fname=r"c:\windows\fonts\simsun.ttc", size=15) #步骤二
  9. #将一维数组变成二维数组
  10. def addLine(X):
  11. length = X.shape[0] #矩阵第一维度的长度
  12. num = np.ones((length,1))
  13. for i in range(0, length):
  14. num[i] = i+1 #坐标轴 x
  15. X = np.c_[num, X] #合并坐标轴
  16. return X
  17. #图像化显示函数
  18. def printLine(x_name, y_name, title, X, Y): #x轴,y轴,标题,矩阵
  19. plt.figure(1)
  20. plt.plot(X[:,0], X[:,1], 'bo', ms=3) #蓝色散点
  21. plt.plot(X[:,0], X[:,1], 'b', ms=3, label='line1') #蓝色实线
  22. plt.plot(Y[:,0], Y[:,1], 'ro', ms=3) #蓝色散点
  23. plt.plot(Y[:,0], Y[:,1], 'r', ms=3, label='line2') #蓝色实线
  24. plt.xlabel(x_name, fontproperties = font) #步骤三
  25. plt.ylabel(y_name, fontproperties = font)
  26. plt.title(title, fontproperties = font)
  27. plt.show()
  28. return 0
  29. #求均方误差
  30. def mean_squared_error(y_true,y_pred):
  31. return np.average((y_true-y_pred) ** 2, axis=0)
  32. #求R2决定系数
  33. def r2_score(y_true,y_pred):
  34. numerator = (y_true-y_pred) ** 2
  35. denominator = (y_true-np.average(y_true,axis=0)) ** 2
  36. return (1 - numerator.sum(axis=0) / denominator.sum(axis=0))
  37. #函数预期
  38. def process_features(X):
  39. scaler = StandardScaler()
  40. X = scaler.fit_transform(X)
  41. m, n = X.shape
  42. X = np.c_[np.ones((m, 1)), X]
  43. return X
  44. X, y = load_diabetes(return_X_y = True) #获取数据
  45. print(X.shape, X)
  46. print(y.shape, y)
  47. y = y.reshape((442, 1)) #一维变二维
  48. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=5) #训练集、测试集划分
  49. X_train = process_features(X_train) #矩阵的标准化
  50. X_test = process_features(X_test) #矩阵的标准化
  51. print(X_train.shape,y_train.shape)
  52. model = LinearRegression()
  53. model.fit(X_train, y_train, eta=0.001, epsilon=0.0001) #标准化处理
  54. y_pred = model.predict(X_test) #数据预测
  55. print(y_pred.shape,y_test.shape,model.w.shape)
  56. mse = mean_squared_error(y_test, y_pred) #均方误差
  57. score = r2_score(y_test, y_pred) #R^2的值
  58. print("mse={} andr2={}".format(mse,score))
  59. y_test = addLine(y_test) #图形化显示
  60. y_pred = addLine(y_pred)
  61. printLine('范围', '期望值', '梯度下降算法解决糖尿病问题', y_pred, y_test)

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/551594
推荐阅读
相关标签
  

闽ICP备14008679号