当前位置:   article > 正文

Python机器学习实践项目1——线性回归预测汽车油耗里程数_使用局部线性回归模型预测汽车的燃油效率

使用局部线性回归模型预测汽车的燃油效率

1. 导入python库和数据

  1. import pandas as pd
  2. import matplotlib
  3. import sklearn
  4. from sklearn.linear_model import LinearRegression
  5. import matplotlib.pyplot as plt
  6. for i in [pd, matplotlib, sklearn]:
  7. print(i.__name__,": ",i.__version__, sep="")

输出:

  1. pandas: 0.25.3
  2. matplotlib: 3.1.2
  3. sklearn: 0.21.3

2. 导入数据

  1. columns = ["mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "model year", "origin", "car name"]
  2. cars = pd.read_table("auto-mpg.data", delim_whitespace=True, names=columns)
  3. print(cars.head(5))
  4. print(cars.dtypes)

输出:

  1. mpg cylinders displacement horsepower weight acceleration \
  2. 0 18.0 8 307.0 130.0 3504.0 12.0
  3. 1 15.0 8 350.0 165.0 3693.0 11.5
  4. 2 18.0 8 318.0 150.0 3436.0 11.0
  5. 3 16.0 8 304.0 150.0 3433.0 12.0
  6. 4 17.0 8 302.0 140.0 3449.0 10.5
  7. model year origin car name
  8. 0 70 1 chevrolet chevelle malibu
  9. 1 70 1 buick skylark 320
  10. 2 70 1 plymouth satellite
  11. 3 70 1 amc rebel sst
  12. 4 70 1 ford torino
  13. mpg float64
  14. cylinders int64
  15. displacement float64
  16. horsepower float64
  17. weight float64
  18. acceleration float64
  19. model year int64
  20. origin int64
  21. car name object
  22. dtype: object

3. 观察数据特征之间的关系

  1. import matplotlib.gridspec as gridspec
  2. fig = plt.figure(figsize=(10,10),tight_layout=True)
  3. #gs = gridspec.GridSpec(3,3)
  4. for i,item in enumerate(["cylinders", "displacement", "horsepower", "weight", "acceleration", \
  5. "model year", "origin"],start=1):
  6. ax = fig.add_subplot(3,3,i)
  7. ax.scatter(cars[item],cars["mpg"])
  8. ax.set_xlabel(item)
  9. ax.set_ylabel("mpg")
  10. plt.show()

输出:

从如上结果可以发现,displacement, horsepower, weight三个指标与mpg有比较明显的相关性,acceleration与mpg相关性较差,其余指标均是离散化的,无法很好地展示出与mpg的相关性。

4. 训练模型与评估

4.1 weight与mpg

选取汽车重量(weight)指标和每加仑油耗(mpg)进行模型预测

  1. lr = LinearRegression(fit_intercept=True) # 模型初始化
  2. lr.fit(cars[["weight"]], cars["mpg"]) # 训练
  3. predictions = lr.predict(cars[["weight"]]) # 预测
  4. print(predictions[0:5])
  5. print(cars["mpg"][0:5])

输出:

  1. [19.41852276 17.96764345 19.94053224 19.96356207 19.84073631]
  2. 0 18.0
  3. 1 15.0
  4. 2 18.0
  5. 3 16.0
  6. 4 17.0
  7. Name: mpg, dtype: float64
  1. plt.scatter(cars["weight"], cars["mpg"], c='red', label="mpg")
  2. plt.scatter(cars["weight"], predictions, c='blue', label="predictions")
  3. plt.legend() # 加上legend
  4. plt.show()

输出:

模型评估

  1. lr = LinearRegression()
  2. lr.fit(cars[["weight"]], cars["mpg"])
  3. predictions = lr.predict(cars[["weight"]])
  4. from sklearn.metrics import mean_squared_error # 导入均方误差函数
  5. mse = mean_squared_error(cars["mpg"], predictions)
  6. print(mse)

输出:

18.780939734628394
  1. mse = mean_squared_error(cars["mpg"], predictions)
  2. rmse = mse ** (0.5) # 对均方误差开根号
  3. print (rmse)

输出:

4.333698159150957

4.2 horsepower与mpg

  1. lr = LinearRegression(fit_intercept=True) # 模型初始化
  2. lr.fit(cars[["horsepower"]], cars["mpg"]) # 训练
  3. predictions = lr.predict(cars[["horsepower"]]) # 预测
  4. print(predictions[0:5])
  5. print(cars["mpg"][0:5])

输出:

  1. [19.41604569 13.89148002 16.25915102 16.25915102 17.83759835]
  2. 0 18.0
  3. 1 15.0
  4. 2 18.0
  5. 3 16.0
  6. 4 17.0
  7. Name: mpg, dtype: float64
  1. plt.scatter(cars["horsepower"], cars["mpg"], c='red', label="mpg")
  2. plt.scatter(cars["horsepower"], predictions, c='blue', label="predictions")
  3. plt.legend() # 加上legend
  4. plt.show()

输出:

  1. from sklearn.metrics import mean_squared_error
  2. lr = LinearRegression()
  3. lr.fit(cars[["horsepower"]], cars["mpg"])
  4. predictions = lr.predict(cars[["horsepower"]])
  5. mse = mean_squared_error(cars["mpg"], predictions)
  6. rmse = mse ** (0.5) # 对均方误差开根号
  7. print (rmse)

输出:

4.893226230065713

4.3 其它指标与mpg

可参照上面的例子自行对其它指标进行分析

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/羊村懒王/article/detail/475845?site
推荐阅读
相关标签
  

闽ICP备14008679号