赞
踩
这篇文章主要是介绍了python基于sklearn库使用不同的机器学习分类器对鸢尾花iris数据集进行分类。
鸢尾花数据集一共150个样本,其中:
特征data的维度为150*4,行数150代表样本数,列数4代表特征数,包含花萼长度、花萼宽度、花瓣长度、花瓣宽度四个特征。
标签target的维度为150*1,代表了不同的种类,0代表setosa,1代表versicolor,2代表virginica(三个不同的种类)。
一共使用了11个分类器,分别是
KNN(KNeighborsClassifier)
逻辑回归(LogisticRegression)
决策树(DecisionTreeClassifier)
梯度提升(GradientBoostingClassifier)
AdaBoost(AdaBoostClassifier)
随机森林(RandomForestClassifier)
高斯朴素贝叶斯(GaussianNB)
多项式朴素贝叶斯(MultinomialNB)
线性判别分析(LinearDiscriminantAnalysis)
二次判别分析(QuadraticDiscriminantAnalysis)
支持向量机(SVC)
- # 引入库
- from sklearn.datasets import load_iris
- from sklearn.model_selection import train_test_split
- from sklearn.neighbors import KNeighborsClassifier
- from sklearn.linear_model import LogisticRegression
- from sklearn.tree import DecisionTreeClassifier
- from sklearn.ensemble import GradientBoostingClassifier, AdaBoostClassifier, RandomForestClassifier
- from sklearn.naive_bayes import GaussianNB, MultinomialNB
- from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
- from sklearn.svm import SVC
-
- # 加载数据集
- iris_dataset = load_iris()
-
- # 划分训练集与测试集
- X_train, X_test, y_train, y_test = train_test_split(iris_dataset['data'], iris_dataset['target'])
-
- # 使用不同的分类器(均使用默认参数)
-
- # KNN
- clf = KNeighborsClassifier()
- clf.fit(X_train, y_train)
- print('KNN accuracy:', clf.score(X_test, y_test))
-
- # 逻辑回归
- clf = LogisticRegression()
- clf.fit(X_train, y_train)
- print('逻辑回归 accuracy:', clf.score(X_test, y_test))
-
- # 决策树
- clf = DecisionTreeClassifier()
- clf.fit(X_train, y_train)
- print('决策树 accuracy:', clf.score(X_test, y_test))
-
- # 梯度提升
- clf = GradientBoostingClassifier()
- clf.fit(X_train, y_train)
- print('梯度提升 accuracy:', clf.score(X_test, y_test))
-
- # AdaBoost
- clf = AdaBoostClassifier()
- clf.fit(X_train, y_train)
- print('AdaBoost accuracy:', clf.score(X_test, y_test))
-
- # 随机森林
- clf = RandomForestClassifier()
- clf.fit(X_train, y_train)
- print('随机森林 accuracy:', clf.score(X_test, y_test))
-
- # 高斯朴素贝叶斯
- clf = GaussianNB()
- clf.fit(X_train, y_train)
- print('高斯朴素贝叶斯 accuracy:', clf.score(X_test, y_test))
-
- # 多项式朴素贝叶斯
- clf = MultinomialNB()
- clf.fit(X_train, y_train)
- print('多项式朴素贝叶斯 accuracy:', clf.score(X_test, y_test))
-
- # 线性判别分析
- clf = LinearDiscriminantAnalysis()
- clf.fit(X_train, y_train)
- print('线性判别分析 accuracy:', clf.score(X_test, y_test))
-
- # 二次判别分析
- clf = QuadraticDiscriminantAnalysis()
- clf.fit(X_train, y_train)
- print('二次判别分析 accuracy:', clf.score(X_test, y_test))
-
- # 支持向量机
- clf = SVC()
- clf.fit(X_train, y_train)
- print('支持向量机 accuracy:', clf.score(X_test, y_test))
- KNN accuracy: 0.9736842105263158
- 逻辑回归 accuracy: 0.9736842105263158
- 决策树 accuracy: 0.9736842105263158
- 梯度提升 accuracy: 0.9473684210526315
- AdaBoost accuracy: 0.9210526315789473
- 随机森林 accuracy: 0.9210526315789473
- 高斯朴素贝叶斯 accuracy: 0.9210526315789473
- 多项式朴素贝叶斯 accuracy: 0.9473684210526315
- 线性判别分析 accuracy: 0.9736842105263158
- 二次判别分析 accuracy: 0.9473684210526315
- 支持向量机 accuracy: 0.9736842105263158
业务合作/学习交流+v:lizhiTechnology
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。