当前位置:   article > 正文

机器学习——线性回归实践

线性回归实践

主要内容:

  • 线性回归方程实现
  • 梯度下降效果
  • 对比不同梯度下降策略
  • 建模曲线分析
  • 过拟合与欠拟合
  • 正则化的作用
  • 提前停止策略

 1.实验目标分析

  1. import numpy as np
  2. import os
  3. %matplotlib inline
  4. import matplotlib
  5. import matplotlib.pyplot as plt
  6. plt.rcParams['axes.labelsize'] = 14
  7. plt.rcParams['xtick.labelsize'] = 12
  8. plt.rcParams['ytick.labelsize'] = 12
  9. import warnings
  10. # 过滤警告
  11. warnings.filterwarnings('ignore')
  12. np.random.seed(42)

回归方程:

当做是一个巧合就可以了,机器学习中核心的思想是迭代更新

  1. import numpy as np
  2. X = 2*np.random.rand(100,1)
  3. y = 4+ 3*X +np.random.randn(100,1)
  4. # np.random.randn(100,1)随机抖动
'
运行
  1. plt.plot(X,y,'r.')
  2. plt.xlabel('X_1')
  3. plt.ylabel('y')
  4. plt.axis([0,2,0,15])
  5. plt.show()

              

 2.参数求解方法

  1. X_b = np.c_[np.ones((100,1)),X]
  2. theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
  3. # (x(T)x)(-1)x(T)y .linalg.inv相当于-1
theta_best
  1. array([[4.21509616],
  2. [2.77011339]])
  1. X_new = np.array([[0],[2]])
  2. # [0]-> 起始位置 [2]-> 终止位置
  3. X_new_b = np.c_[np.ones((2,1)),X_new]#测试数据
  4. y_predict = X_new_b.dot(theta_best)# theta_best权重值
  5. y_predict
  6. # theta0 和 theta1
  1. array([[4.21509616],
  2. [9.75532293]])
  1. plt.plot(X_new,y_predict,'r--')
  2. plt.plot(X,y,'b.')
  3. plt.axis([0,2,0,15])
  4. plt.show()

         

sklearn api文档:

 API Reference — scikit-learn 1.0.2 documentation

  1. # https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression
  2. from sklearn.linear_model import LinearRegression
  3. lin_reg = LinearRegression()
  4. lin_reg.fit(X,y)
  5. print (lin_reg.coef_)#lin_reg.coef_权重参数
  6. print (lin_reg.intercept_)#lin_reg.intercept_偏置参数
  1. [[2.77011339]]
  2. [4.21509616]

3.预处理对结果的影响 

梯度下降

核心解决方案,不光在线性回归中能用上,还有其他算法中能用上,比如神经网络

 问题:步长太小

问题:步长太大

 

 

学习率应当尽可能小,随着迭代的进行应当越来越小。

标准化的作用:

  • 拿到数据之后基本上都需要做一次标准化操作

 批量梯度下降计算公式

4.梯度下降 

批量梯度下降

  1. eta = 0.1# 学习率
  2. n_iterations = 1000# 迭代次数
  3. m = 100# 样本个数
  4. theta = np.random.randn(2,1)#随机初始化
  5. for iteration in range(n_iterations):
  6. gradients = 2/m* X_b.T.dot(X_b.dot(theta)-y) #(2/m)X(T)(X*theta-y)
  7. theta = theta - eta*gradients# theta = theta - theta*学习率
theta
  1. array([[4.21509616],
  2. [2.77011339]])
X_new_b.dot(theta)
  1. array([[4.21509616],
  2. [9.75532293]])

 5.学习率对结果的影响

  1. theta_path_bgd = []
  2. def plot_gradient_descent(theta,eta,theta_path = None):
  3. m = len(X_b)
  4. plt.plot(X,y,'b.')
  5. n_iterations = 1000
  6. for iteration in range(n_iterations):
  7. y_predict = X_new_b.dot(theta)#预测结果
  8. plt.plot(X_new,y_predict,'b-')
  9. gradients = 2/m* X_b.T.dot(X_b.dot(theta)-y)
  10. theta = theta - eta*gradients#theta值更新
  11. if theta_path is not None:
  12. theta_path.append(theta)
  13. plt.xlabel('X_1')
  14. plt.axis([0,2,0,15])
  15. plt.title('eta = {}'.format(eta))
'
运行
  1. theta = np.random.randn(2,1)
  2. plt.figure(figsize=(10,4))
  3. plt.subplot(131)
  4. plot_gradient_descent(theta,eta = 0.02)
  5. plt.subplot(132)
  6. plot_gradient_descent(theta,eta = 0.1,theta_path=theta_path_bgd)
  7. plt.subplot(133)
  8. plot_gradient_descent(theta,eta = 0.5)
  9. plt.show()

6.随机梯度下降得到的结果

  1. theta_path_sgd=[]
  2. m = len(X_b)
  3. np.random.seed(42)
  4. n_epochs = 50
  5. t0 = 5
  6. t1 = 50
  7. def learning_schedule(t):
  8. return t0/(t1+t)
  9. theta = np.random.randn(2,1)#theta的初始化
  10. for epoch in range(n_epochs):
  11. for i in range(m):
  12. if epoch < 10 and i<10:
  13. y_predict = X_new_b.dot(theta)
  14. plt.plot(X_new,y_predict,'r-')
  15. random_index = np.random.randint(m)
  16. xi = X_b[random_index:random_index+1]#取当前数据
  17. yi = y[random_index:random_index+1]
  18. gradients = 2* xi.T.dot(xi.dot(theta)-yi)#梯度值的计算
  19. eta = learning_schedule(epoch*m+i)# 学习率的衰减
  20. theta = theta-eta*gradients#theta值更新
  21. theta_path_sgd.append(theta)
  22. plt.plot(X,y,'b.')
  23. plt.axis([0,2,0,15])
  24. plt.show()

                   

 7.MiniBatch方法

  1. theta_path_mgd=[]
  2. n_epochs = 50#迭代次数
  3. minibatch = 16#
  4. theta = np.random.randn(2,1)#初始化theta
  5. t0, t1 = 200, 1000
  6. def learning_schedule(t):
  7. return t0 / (t + t1)
  8. np.random.seed(42)
  9. t = 0
  10. for epoch in range(n_epochs):
  11. shuffled_indices = np.random.permutation(m)#对数据进行洗牌
  12. X_b_shuffled = X_b[shuffled_indices]#及那个新的索引回传到X_b
  13. y_shuffled = y[shuffled_indices]
  14. for i in range(0,m,minibatch):
  15. t+=1
  16. xi = X_b_shuffled[i:i+minibatch]
  17. yi = y_shuffled[i:i+minibatch]
  18. gradients = 2/minibatch* xi.T.dot(xi.dot(theta)-yi)
  19. eta = learning_schedule(t)#学习率的衰减 t表示当前的计数器
  20. theta = theta-eta*gradients#theta值更新
  21. theta_path_mgd.append(theta)
theta
  1. array([[4.25490685],
  2. [2.80388784]])

8.不同策略效果对比 

  1. theta_path_bgd = np.array(theta_path_bgd)
  2. theta_path_sgd = np.array(theta_path_sgd)
  3. theta_path_mgd = np.array(theta_path_mgd)
  1. plt.figure(figsize=(12,6))
  2. plt.plot(theta_path_sgd[:,0],theta_path_sgd[:,1],'r-s',linewidth=1,label='SGD')#theta_path_sgd[:,0]->theta0 theta_path_sgd[:,0] -> theta1
  3. plt.plot(theta_path_mgd[:,0],theta_path_mgd[:,1],'g-+',linewidth=2,label='MINIGD')
  4. plt.plot(theta_path_bgd[:,0],theta_path_bgd[:,1],'b-o',linewidth=3,label='BGD')
  5. plt.legend(loc='upper left')
  6. plt.axis([3.5,4.5,2.0,4.0])
  7. plt.show()

 实际当中用minibatch比较多,一般情况下选择batch数量应当越大越好。

9.多项式回归

  1. m = 100
  2. X = 6*np.random.rand(m,1) - 3
  3. y = 0.5*X**2+X+np.random.randn(m,1)#.randn-> 高斯抖动
  1. plt.plot(X,y,'b.')
  2. plt.xlabel('X_1')
  3. plt.ylabel('y')
  4. plt.axis([-3,3,-5,10])
  5. plt.show()

   

  1. from sklearn.preprocessing import PolynomialFeatures
  2. poly_features = PolynomialFeatures(degree = 2,include_bias = False)
  3. X_poly = poly_features.fit_transform(X)# fit表示实际进行多项式转换 transform实际执行完后将结果整合在一块
  4. X[0]
array([2.38942838])
X_poly[0]#x = 2.38942838 x^2 =  5.709368 
array([2.38942838, 5.709368  ])
2.82919615 ** 2
8.004350855174822
  1. from sklearn.linear_model import LinearRegression
  2. lin_reg = LinearRegression()
  3. lin_reg.fit(X_poly,y)
  4. print (lin_reg.coef_)
  5. print (lin_reg.intercept_)
  6. # 回归方程为:y=1.63887939x + 0.5810637 * (x ** 2)+4.56140272
'
运行
[[0.95038538 0.52577032]]
[-0.0264767]
  1. X_new = np.linspace(-3,3,100).reshape(100,1)
  2. X_new_poly = poly_features.transform(X_new)
  3. y_new = lin_reg.predict(X_new_poly)
  4. plt.plot(X,y,'b.')
  5. plt.plot(X_new,y_new,'r--',label='prediction')
  6. plt.axis([-3,3,-5,10])
  7. plt.legend()
  8. plt.show()
'
运行

        

10.模型复杂度 

  1. from sklearn.pipeline import Pipeline
  2. from sklearn.preprocessing import StandardScaler#d导入标准化模块
  3. plt.figure(figsize=(12,6))
  4. for style,width,degree in (('g-',1,100),('m--',1,2),('r-+',1,1)):
  5. poly_features = PolynomialFeatures(degree = degree,include_bias = False)
  6. std = StandardScaler()
  7. lin_reg = LinearRegression()
  8. polynomial_reg = Pipeline([('poly_features',poly_features),
  9. ('StandardScaler',std), # 标准化操作
  10. ('lin_reg',lin_reg)]) #回归操作
  11. polynomial_reg.fit(X,y)
  12. y_new_2 = polynomial_reg.predict(X_new)
  13. plt.plot(X_new,y_new_2,style,label = 'degree '+str(degree),linewidth = width)
  14. plt.plot(X,y,'b.')
  15. plt.axis([-3,3,-5,10])
  16. plt.legend()
  17. plt.show()

 特征变换的越复杂,得到的结果过拟合风险越高,不建议做的特别复杂。

11.样本数量对结果的影响 

数据样本数量对结果的影响

sklearn.metrics.mean_squared_error — scikit-learn 1.0.2 documentation

  1. from sklearn.metrics import mean_squared_error
  2. from sklearn.model_selection import train_test_split
  3. def plot_learning_curves(model,X,y):
  4. X_train, X_val, y_train, y_val = train_test_split(X,y,test_size = 0.2,random_state=100)
  5. train_errors,val_errors = [],[]
  6. for m in range(1,len(X_train)):
  7. model.fit(X_train[:m],y_train[:m])
  8. y_train_predict = model.predict(X_train[:m])#训练集的结果
  9. y_val_predict = model.predict(X_val)#验证集的结果
  10. train_errors.append(mean_squared_error(y_train[:m],y_train_predict[:m]))
  11. val_errors.append(mean_squared_error(y_val,y_val_predict))
  12. plt.plot(np.sqrt(train_errors),'r-+',linewidth = 2,label = 'train_error')#均方误差。
  13. plt.plot(np.sqrt(val_errors),'b-',linewidth = 3,label = 'val_error')
  14. plt.xlabel('Training set size')
  15. plt.ylabel('RMSE')
  16. plt.legend()
  1. lin_reg = LinearRegression()
  2. plot_learning_curves(lin_reg,X,y)
  3. plt.axis([0,80,0,5])
  4. plt.show()

                 

数据量越少,训练集的效果会越好,但是实际测试效果很一般。实际做模型的时候需要参考测试集和验证集的效果。

多项式回归的过拟合风险

  1. polynomial_reg = Pipeline([('poly_features',PolynomialFeatures(degree = 25,include_bias = False)),
  2. ('lin_reg',LinearRegression())])#流水线
  3. plot_learning_curves(polynomial_reg,X,y)
  4. plt.axis([0,80,0,5])
  5. plt.show()

                 

 越复杂越过拟合

12.正则化的作用 

对权重参数进行惩罚,让权重参数尽可能平滑一些,有两种不同的方法来进行正则化惩罚:

  1. from sklearn.linear_model import Ridge
  2. np.random.seed(42)
  3. m = 20
  4. X = 3*np.random.rand(m,1)
  5. y = 0.5 * X + np.random.randn(m,1)/1.5 +1
  6. X_new = np.linspace(0,3,100).reshape(100,1)
  7. def plot_model(model_calss,polynomial,alphas,**model_kargs):
  8. for alpha,style in zip(alphas,('b-','g--','r:')):
  9. model = model_calss(alpha,**model_kargs)
  10. if polynomial:
  11. model = Pipeline([('poly_features',PolynomialFeatures(degree =10,include_bias = False)),
  12. ('StandardScaler',StandardScaler()),
  13. ('lin_reg',model)])
  14. model.fit(X,y)
  15. y_new_regul = model.predict(X_new)
  16. lw = 2 if alpha > 0 else 1
  17. plt.plot(X_new,y_new_regul,style,linewidth = lw,label = 'alpha = {}'.format(alpha))
  18. plt.plot(X,y,'b.',linewidth =3)
  19. plt.legend()
  20. plt.figure(figsize=(14,6))
  21. plt.subplot(121)
  22. plot_model(Ridge,polynomial=False,alphas = (0,10,100))
  23. plt.subplot(122)
  24. plot_model(Ridge,polynomial=True,alphas = (0,10**-5,1))
  25. plt.show()

 惩罚力度越大,alpha值越大的时候,得到的决策方程越平稳。

13.岭回归与lasso 

  1. #https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge
  2. from sklearn.linear_model import Lasso
  3. plt.figure(figsize=(14,6))
  4. plt.subplot(121)
  5. plot_model(Lasso,polynomial=False,alphas = (0,0.1,1))
  6. plt.subplot(122)
  7. plot_model(Lasso,polynomial=True,alphas = (0,10**-1,1))
  8. plt.show()

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/801933
推荐阅读
相关标签
  

闽ICP备14008679号