赞
踩
糖尿病数据集是Sklearn 提供的数据集。它从442例糖尿病患者的资料中取10个特征:年龄、性别、体重、血压和6个血清测试量值,以及患者在一年后疾病发展的量化值(标签)。
根据上述10个特征,预测病情发展的量化值。
包括数据导入、数据预处理、算法描述、主要代码。
结论:正规方程和Scikit-learn的模型预测比岭回归算法的预测模型好
机器学习GitHub:https://github.com/wanglei18/machine_learning
ridge_regression.py
- import numpy as np
-
- class RidgeRegression:
- def __init__(self, Lambda):
- self.Lambda = Lambda
-
- def fit(self, X, y):
- m, n = X.shape
- r = np.diag(self.Lambda * np.ones(n))
- self.w = np.linalg.inv(X.T.dot(X) + r).dot(X.T).dot(y)
- return
-
- def predict(self, X):
- return X.dot(self.w)
- # 第二次作业.2部分
- import sklearn.datasets
- import numpy as np
- import machine_learning.linear_regression.lib.linear_regression as lib
- import machine_learning.linear_regression.lib.ridge_regression as Rg
- from sklearn import linear_model
- from sklearn.preprocessing import PolynomialFeatures
- from sklearn.model_selection import train_test_split
-
- def process_features(X):
- m, n = X.shape
- X = np.c_[np.ones((m, 1)), X]
- return X
-
- np.random.seed(100)
- X, y = sklearn.datasets.load_diabetes(return_X_y = True)
-
- #1.正规方程求解法
- x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=5)
- x_train = process_features(x_train) #特征处理
- x_test = process_features(x_test)
-
- model = lib.LinearRegression()
- model.fit(x_train, y_train) #训练数据
-
- y_pred=model.predict(x_test)
- mse = lib.mean_squared_error(y_test,y_pred) #h的均方误差
- r2 = lib.r2_score(y_test,y_pred) #R^2的决定系数
- print("mse={}andr2={}".format(mse,r2))
-
- '''
- #2.岭回归算法
- x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=5)
- polt = PolynomialFeatures(degree = 2)
- x_poly = polt.fit_transform(x_train) #特征处理
- model = Rg.RidgeRegression(Lambda = 0.2)
- model.fit(x_poly,y_train) #训练数据
- x_test = polt.fit_transform(x_test) #X特征标准化
- y_pred = model.predict(x_test) #预测数据
- mse = lib.mean_squared_error(y_test,y_pred) #h的均方误差
- r2 = lib.r2_score(y_test,y_pred) #R^2的决定系数
- print("mse={}andr2={}".format(mse,r2))
- '''
-
- '''
- #3.Scikit-learn
- x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=5)
- x_train = process_features(x_train) #特征处理
- x_test = process_features(x_test) #h的均方误差
- clf = linear_model.LinearRegression()
- clf.fit(x_train, y_train) #训练数据
- y_pred=clf.predict(x_test) #预测数据
- mse = lib.mean_squared_error(y_test,y_pred) #h的均方误差
- r2 = lib.r2_score(y_test,y_pred) #R^2的决定系数
- print("mse={}andr2={}".format(mse,r2))
- '''
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。