赞
踩
1.首先读入数据(已经处理(删除,填补,类型转换,归一化)过的数据)并定义计算accuracy、precision,recall和F1-score的函数,并对数据采用sklearn中的特征递归消除法进行特征选择
- from sklearn import metrics
- import pandas as pd
- from sklearn.feature_selection import RFE
-
- def metrics_result(true, predict):#定义并输出正确率、精确率、召回率、AUC
- acc = metrics.accuracy_score(true, predict)
- pre = metrics.precision_score(true, predict)
- reca = metrics.recall_score(true, predict)
- f_sco = metrics.f1_score(true, predict)
- # auc_ = metrics.auc(true, predict)
- return acc, pre, reca, f_sco
-
-
- data = pd.read_csv('data_imp.csv')#读入已经处理(删除,填补,类型转换,归一化)过的数据
- label = data['status']
- data = data.iloc[:,:-1]
-
- data_=RFE(estimator=RandomForestClassifier(), n_features_to_select=30).fit_transform(data,label)
2.定义相关的分类函数
- from sklearn.model_selection import StratifiedKFold
- from sklearn.linear_model import LogisticRegression
- from sklearn.ensemble import RandomForestClassifier
- from sklearn.tree import DecisionTreeClassifier
- from sklearn import svm
- from xgboost.sklearn import XGBClassifier
-
- def LR_classifier(train_data, train_label, test_data, test_label):
- clf = LogisticRegression(C=1.0, max_iter=1000)
- prediction_test = clf.fit(train_data, train_label).predict(test_data)
- prediction_train = clf.fit(train_data, train_label).predict(train_data)
- return prediction_test, prediction_train
-
- def svm_classifier(train_data, train_label, test_data, test_label):
- clf = svm.SVC(C=1.0, kernel='linear', gamma=20)
- prediction_test = clf.fit(train_data, train_label).predict(test_data)
- prediction_train = clf.fit(train_data, train_label).predict(train_data)
- return prediction_test, prediction_train
-
- def dt_classifier(train_data, train_label, test_data, test_label):
- clf = DecisionTreeClassifier(max_depth=5)
- prediction_test = clf.fit(train_data, train_label).predict(test_data)
- prediction_train = clf.fit(train_data, train_label).predict(train_data)
- return prediction_test, prediction_train
- def rf_classifier(train_data, train_label, test_data, test_label):
- clf = RandomForestClassifier(n_estimators=8, random_state=5, max_depth=6, min_samples_split=2)
- prediction_test = clf.fit(train_data, train_label).predict(test_data)
- prediction_train = clf.fit(train_data, train_label).predict(train_data)
- return prediction_test, prediction_train
- def xgb_classifier(train_data, train_label, test_data, test_label):
- clf = XGBClassifier(n_estimators=8,learning_rate= 0.25, max_depth=20,subsample=1,gamma=13, seed=1000,num_class=1)
- prediction_test = clf.fit(train_data, train_label).predict(test_data)
- prediction_train = clf.fit(train_data, train_label).predict(train_data)
- return prediction_test, prediction_train
然后再进行四折交叉验证,最后结果如下:
模型 | accuracy | precision | recall | f1_score | auc | roc curve |
---|---|---|---|---|---|---|
Logistic Regression | train:0.7909, test:0.7868 | train:0.7352, test:0.7209 | train:0.2638, test:0.2525 | train:0.3883, test:0.3731 | train:0.6132, test:0.6216 | |
Support Vector Machine | train:0.778, test:0.7753 | train:0.8144, test:0.8023 | train:0.1524, test:0.1437 | train:0.2568, test:0.2433 | train:0.5694, test:0.5743 | |
Decision Tree | train:0.8082, test:0.7735 | train:0.766, test:0.6196 | train:0.345, test:0.2704 | train:0.4734, test:0.3755 | train:0.6272, test:0.6138 | |
Random Forest | train:0.8218, test:0.7768 | train:0.8931, test:0.661 | train:0.3321, test:0.2354 | train:0.4839, test:0.3457 | train:0.6642, test:0.6119 | |
XGBoost | train:0.8138, test:0.7845 | train:0.7897, test:0.6655 | train:0.3543, test:0.2875 | train:0.4889, test:0.4012 | train:0.6610, test:0.6455 |
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。