赞
踩
- import pandas as pd
- import matplotlib
- import sklearn
- from sklearn.linear_model import LinearRegression
- import matplotlib.pyplot as plt
-
- for i in [pd, matplotlib, sklearn]:
- print(i.__name__,": ",i.__version__, sep="")
输出:
- pandas: 0.25.3
- matplotlib: 3.1.2
- sklearn: 0.21.3
- columns = ["mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "model year", "origin", "car name"]
- cars = pd.read_table("auto-mpg.data", delim_whitespace=True, names=columns)
- print(cars.head(5))
- print(cars.dtypes)
输出:
- mpg cylinders displacement horsepower weight acceleration \
- 0 18.0 8 307.0 130.0 3504.0 12.0
- 1 15.0 8 350.0 165.0 3693.0 11.5
- 2 18.0 8 318.0 150.0 3436.0 11.0
- 3 16.0 8 304.0 150.0 3433.0 12.0
- 4 17.0 8 302.0 140.0 3449.0 10.5
-
- model year origin car name
- 0 70 1 chevrolet chevelle malibu
- 1 70 1 buick skylark 320
- 2 70 1 plymouth satellite
- 3 70 1 amc rebel sst
- 4 70 1 ford torino
- mpg float64
- cylinders int64
- displacement float64
- horsepower float64
- weight float64
- acceleration float64
- model year int64
- origin int64
- car name object
- dtype: object

- import matplotlib.gridspec as gridspec
- fig = plt.figure(figsize=(10,10),tight_layout=True)
-
- #gs = gridspec.GridSpec(3,3)
-
- for i,item in enumerate(["cylinders", "displacement", "horsepower", "weight", "acceleration", \
- "model year", "origin"],start=1):
- ax = fig.add_subplot(3,3,i)
- ax.scatter(cars[item],cars["mpg"])
- ax.set_xlabel(item)
- ax.set_ylabel("mpg")
-
- plt.show()
输出:
从如上结果可以发现,displacement, horsepower, weight三个指标与mpg有比较明显的相关性,acceleration与mpg相关性较差,其余指标均是离散化的,无法很好地展示出与mpg的相关性。
选取汽车重量(weight)指标和每加仑油耗(mpg)进行模型预测
- lr = LinearRegression(fit_intercept=True) # 模型初始化
- lr.fit(cars[["weight"]], cars["mpg"]) # 训练
-
- predictions = lr.predict(cars[["weight"]]) # 预测
- print(predictions[0:5])
- print(cars["mpg"][0:5])
输出:
- [19.41852276 17.96764345 19.94053224 19.96356207 19.84073631]
- 0 18.0
- 1 15.0
- 2 18.0
- 3 16.0
- 4 17.0
- Name: mpg, dtype: float64
- plt.scatter(cars["weight"], cars["mpg"], c='red', label="mpg")
- plt.scatter(cars["weight"], predictions, c='blue', label="predictions")
- plt.legend() # 加上legend
- plt.show()
输出:
模型评估
- lr = LinearRegression()
- lr.fit(cars[["weight"]], cars["mpg"])
- predictions = lr.predict(cars[["weight"]])
-
- from sklearn.metrics import mean_squared_error # 导入均方误差函数
- mse = mean_squared_error(cars["mpg"], predictions)
- print(mse)
输出:
18.780939734628394
- mse = mean_squared_error(cars["mpg"], predictions)
- rmse = mse ** (0.5) # 对均方误差开根号
- print (rmse)
输出:
4.333698159150957
- lr = LinearRegression(fit_intercept=True) # 模型初始化
- lr.fit(cars[["horsepower"]], cars["mpg"]) # 训练
-
- predictions = lr.predict(cars[["horsepower"]]) # 预测
- print(predictions[0:5])
- print(cars["mpg"][0:5])
输出:
- [19.41604569 13.89148002 16.25915102 16.25915102 17.83759835]
- 0 18.0
- 1 15.0
- 2 18.0
- 3 16.0
- 4 17.0
- Name: mpg, dtype: float64
- plt.scatter(cars["horsepower"], cars["mpg"], c='red', label="mpg")
- plt.scatter(cars["horsepower"], predictions, c='blue', label="predictions")
- plt.legend() # 加上legend
- plt.show()
输出:
- from sklearn.metrics import mean_squared_error
-
- lr = LinearRegression()
- lr.fit(cars[["horsepower"]], cars["mpg"])
- predictions = lr.predict(cars[["horsepower"]])
-
-
- mse = mean_squared_error(cars["mpg"], predictions)
- rmse = mse ** (0.5) # 对均方误差开根号
- print (rmse)
输出:
4.893226230065713
可参照上面的例子自行对其它指标进行分析
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。