当前位置:   article > 正文

dhu 数据科学与技术 第9次作业_葡萄酒数据集搜集了法国不同产区

葡萄酒数据集搜集了法国不同产区

一. 简答题(共2题,100分)

  1. (简答题, 50分)葡萄酒数据集(wine.data)搜集了法国不同产区葡萄酒的化学指标。试建立决策树、SVM和神经网络3种分类器模型,比较各种分类器在此数据集上的效果。
    【提示】:每种分类器,需要对参数进行尝试,找出此种分类算法的较优模型,再与其他分类器性能进行比较
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import model_selection
from sklearn import tree
from sklearn import svm
from sklearn.neural_network import MLPClassifier
from pandas import DataFrame

#处理数据
data = pd.read_csv('C:\\python\\wine.data')
d_input = []
[d_input.append(str(i)) for i in range(14)]
data.columns = d_input
X = data.iloc[:,1:14].values.astype(float)
y = data.iloc[:,0].values.astype(float)
#划分数据集
X_train,X_test,y_train,y_test = model_selection.train_test_split(X,y,test_size=0.25,random_state=1)

# 决策树
learning = tree.DecisionTreeClassifier()
learning.fit(X_train,y_train)
print('决策树性能:{:.2f}'.format(learning.score(X_test,y_test)))
d1 = learning.score(X_test,y_test)

# SVM
learning = svm.SVC(kernel='linear', gamma=0.6, C = 100)
learning.fit(X_train,y_train)
print('支持向量机性能:{:.2f}'.format(learning.score(X_test,y_test)))
d2 = learning.score(X_test,y_test)

# 神经网络
learning = MLPClassifier(solver='lbfgs',batch_size='auto',random_state=1)
learning.fit(X_train,y_train)
print('神经网络性能:{:.2f}'.format(learning.score(X_test,y_test)))
d3 = learning.score(X_test,y_test)

# 分析性能
d4=[d1,d2,d3]
d = DataFrame(d4,columns=['score'],index=['tree','svm','MLP'])
d.plot(kind='bar',rot=0,use_index=True)
plt.show()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  1. (简答题, 50分)
    bankpep.csv

基于Keras建立深度神经网络模型,在bankpep数据集上训练神经网络分类模型,将训练模型的耗时以及模型性能,与XGBoost、SVM、朴素贝叶斯等方法进行比较。

import datetime
import time
import pandas as pd
from sklearn import model_selection
from sklearn import tree
from sklearn.naive_bayes import GaussianNB
from sklearn import svm
from pandas import DataFrame
import matplotlib.pyplot as plt
import matplotlib
from xgboost.sklearn import XGBClassifier
from sklearn.neural_network import MLPClassifier

data = pd.read_csv('C:\\python\\bankpep.csv',index_col=0,header=0)

seq1 = ['married','car','save_act','current_act','mortgage','pep']
for feature in seq1:
    data.loc[data[feature]=='YES',feature] = 1
    data.loc[data[feature]=='NO',feature] = 0
data.loc[data['sex']=='FEMALE','sex'] = 0
data.loc[data['sex']=='MALE','sex'] = 1
data.loc[data['region']=='INNER_CITY','region'] = 1
data.loc[data['region']=='RURAL','region'] = 2
data.loc[data['region']=='TOWN','region'] = 3
data.loc[data['region']=='SUBURBAN','region'] = 4
X1 = data.iloc[:,0:9].values.astype(float)
y1 = data.iloc[:,10].values.astype(int)

# (1
X_train1,X_test1,y_train1,y_test1 = model_selection.train_test_split(X1,y1,test_size=0.25,random_state=int(time.time()))
s1=datetime.datetime.now()
d1 = MLPClassifier(solver='lbfgs',activation='identity',random_state=1)
d1.fit(X_train1, y_train1)
s2=datetime.datetime.now()
d1 = d1.score(X_test1,y_test1)
s_1=s2-s1
print('神经网络性能:{:.2f},运行时间为{}'.format(d1,s_1))

# 朴素贝叶斯
s1=datetime.datetime.now()
learning = GaussianNB()
learning.fit(X_train1, y_train1)
s2=datetime.datetime.now()
s_2=s2-s1
print("朴素贝叶斯性能:{:.2f},运行时间为{}".format(learning.score(X_test1,y_test1),s_2))
d2 = learning.score(X_test1,y_test1)

# SVM
data.loc[data['sex']=='FEMALE','sex'] = 1
data.loc[data['sex']=='MALE','sex'] = 0
dumm_reg = pd.get_dummies(data['region'],prefix='region')
dumm_child = pd.get_dummies(data['children'],prefix='children')
df1 = data.drop(['region','children'],axis=1)
df2 = df1.join([dumm_reg,dumm_child],how='outer')
X3 = df2.drop(['pep'],axis=1).values.astype(float)
y3 = df2['pep'].values.astype(int)
X_train3,X_test3,y_train3,y_test3 = model_selection.train_test_split(X3,y3,test_size=0.25,random_state=int(time.time()))
s1=datetime.datetime.now()
learning = svm.SVC(kernel='rbf',gamma=0.7,C=0.001)
learning.fit(X_train3,y_train3)
s2=datetime.datetime.now()
s_3=s2-s1
print("SVM性能:{:.2f},运行时间为{}".format(learning.score(X_test3,y_test3),s_3))
d3 = learning.score(X_test3,y_test3)

# XGBoost
s1=datetime.datetime.now()
learning=XGBClassifier(max_depth=6,gamma=0,subsample=1,colsample_bytree=1)
learning.fit(X_train3,y_train3)
s2=datetime.datetime.now()
s_4=s2-s1
d4 = learning.score(X_test3,y_test3)
print("XGBoost性能:{:.2f},运行时间为{}".format(learning.score(X_test3,y_test3),s_4))

# 性能分析
data = [d1,d2,d3,d4]
data = DataFrame(data,columns=['score'],index=['MLP','Naive Bayes','SVM','XGBoost'])
data.plot(kind='bar',title='decision score on test set',rot=0)
plt.show()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
声明:本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号