赞
踩
鸢尾花分类及特征属性:
鸢尾花是一种多年生草本植物。sklearn.datasets.load_iris()数据集将其归为三类:setosa,versicolor和virginnica,分别标注为0, 1, 2;包含了鸢尾花的四种特征维度,分别是花萼的长度、宽度和花瓣的长度、宽度。
逻辑回归——建立模型
import numpy as np
from sklearn import datasets # sklearn数据集
import matplotlib.pyplot as plt # 绘图
from sklearn.linear_model import LogisticRegression # 导入逻辑回归
data_iris = datasets.load_iris() # 鸢尾花数据集
print(list(data_iris.keys()))
print(data_iris['DESCR'])
print(list(data_iris.keys()))结果为
[‘data’, ‘target’, ‘target_names’, ‘DESCR’, ‘feature_names’, ‘filename’]
print(data_iris[‘DESCR’])结果包含:
Min Max Mean SD Class Correlation
============== ==== ==== ======= ===== ====================
sepal length: 4.3 7.9 5.84 0.83 0.7826
sepal width: 2.0 4.4 3.05 0.43 -0.4194
petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
============== ==== ==== ======= ===== ====================
不难发现,petal length(花瓣长度)和petal width(花瓣宽度)对Class Correlation影响最大。作为初学者,这里以petal width作为预测指标展开讲解,以达到简单易懂的效果,从而不会产生疲惫难学之感。
X = data_iris['data'][:, 3:] # 二维数组 'petal width (cm)'
y = data_iris['target']
log_reg = LogisticRegression(multi_class='ovr', solver='sag')
log_reg.fit(X, y)
print(X) # 150个鸢尾花花瓣宽度
print(y) # 标签检测情况
print(X)结果为:[[0.2]
[0.2]
[0.2]
[0.2]
[0.2]
[0.4]
[0.3]
[0.2]
[0.2]
[0.1]
[0.2]
[0.2]
[0.1]
[0.1]
[0.2]
[0.4]
[0.4]
[0.3]
[0.3]
[0.3]
[0.2]
[0.4]
[0.2]
[0.5]
[0.2]
[0.2]
[0.4]
[0.2]
[0.2]
[0.2]
[0.2]
[0.4]
[0.1]
[0.2]
[0.2]
[0.2]
[0.2]
[0.1]
[0.2]
[0.2]
[0.3]
[0.3]
[0.2]
[0.6]
[0.4]
[0.3]
[0.2]
[0.2]
[0.2]
[0.2]
[1.4]
[1.5]
[1.5]
[1.3]
[1.5]
[1.3]
[1.6]
[1. ]
[1.3]
[1.4]
[1. ]
[1.5]
[1. ]
[1.4]
[1.3]
[1.4]
[1.5]
[1. ]
[1.5]
[1.1]
[1.8]
[1.3]
[1.5]
[1.2]
[1.3]
[1.4]
[1.4]
[1.7]
[1.5]
[1. ]
[1.1]
[1. ]
[1.2]
[1.6]
[1.5]
[1.6]
[1.5]
[1.3]
[1.3]
[1.3]
[1.2]
[1.4]
[1.2]
[1. ]
[1.3]
[1.2]
[1.3]
[1.3]
[1.1]
[1.3]
[2.5]
[1.9]
[2.1]
[1.8]
[2.2]
[2.1]
[1.7]
[1.8]
[1.8]
[2.5]
[2. ]
[1.9]
[2.1]
[2. ]
[2.4]
[2.3]
[1.8]
[2.2]
[2.3]
[1.5]
[2.3]
[2. ]
[2. ]
[1.8]
[2.1]
[1.8]
[1.8]
[1.8]
[2.1]
[1.6]
[1.9]
[2. ]
[2.2]
[1.5]
[1.4]
[2.3]
[2.4]
[1.8]
[1.8]
[2.1]
[2.4]
[2.3]
[1.9]
[2.3]
[2.5]
[2.3]
[1.9]
[2. ]
[2.3]
[1.8]]
print(y)结果为:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
逻辑回归——预测分类结果
简单起见,我们用numpy.linspace生成0-3间均等划分的1000个数的二维数组。
X_new = np.linspace(0, 3, 1000).reshape(-1, 1)
预测分类
y_hat = log_reg.predict(X_new) # 根据花瓣宽度预测属于哪类
print(y_hat)
y_prob = log_reg.predict_proba(X_new) # 鸢尾花三种分类的可能性概率
print(y_prob)
当花瓣宽度为1.7, 1.5, 0.5时,预测一下分类。
print(log_reg.predict([[1.7], [1.5], [0.5]]))
结果为:[2 1 0]。当花瓣宽度为1.7cm时,为virginnica的可能性更大;1.5cm时,versicolor的可能性更大;0.5cm时,为setosa的可能性更大。
用matplotlib画图看一下。
plt.plot(X_new, y_prob[:, 2], 'g-', label='Iris-Virginica')
plt.plot(X_new, y_prob[:, 1], 'r-', label='Iris-Versicolour')
plt.plot(X_new, y_prob[:, 0], 'b-', label='Iris-Setosa')
plt.legend()
plt.show()
横坐标代表花瓣的宽度,从0cm到3cm。纵坐标代表概率值,概率值越大,可能性越大。
至此,以花瓣宽度为单维度指标的鸢尾花分类的模型建立和分类预测已经完成。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。