赞
踩
上篇博文介绍了xgboost这个算法的推导,下面我们在调包使用这个算法的时候,有一些参数是需要我们理解的。
https://blog.csdn.net/weixin_43172660/article/details/83048394 这是上篇博文
这里先讲怎么调用xgboost这个包进行运算
首先先引入这个包和数据(包可以用pip install xgboost进行下载)
import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
data = pd.read_csv('pima-indians-diabetes.csv',header=None)
X = data.iloc[:,0:8]
y = data.iloc[:,8]
test_size = 0.33
X_train , X_test , y_train , y_test = train_test_split(X,y,test_size = test_size,random_state = 7)
下面可以调用模型,fit函数一些参数的意思
model = XGBClassifier()
eval_set = [(X_test, y_test)]
model.fit(X_train,y_train,early_stopping_rounds=5,eval_metric="logloss",eval_set=eval_set)
y_pred = model.predict(X_test)
score = accuracy_score(y_test, y_pred)
print(score)
结果见下图:
[0] validation_0-logloss:0.660186 Will train until validation_0-logloss hasn't improved in 5 rounds. [1] validation_0-logloss:0.634854 [2] validation_0-logloss:0.612239 [3] validation_0-logloss:0.593118 [4] validation_0-logloss:0.578303 [5] validation_0-logloss:0.564942 [6] validation_0-logloss:0.555113 [7] validation_0-logloss:0.54499 [8] validation_0-logloss:0.539151 [9] validation_0-logloss:0.531819 [10] validation_0-logloss:0.526065 [11] validation_0-logloss:0.51977 [12] validation_0-logloss:0.514979 [13] validation_0-logloss:0.50927 [14] validation_0-logloss:0.506086 [15] validation_0-logloss:0.503565 [16] validation_0-logloss:0.503591 [17] validation_0-logloss:0.500805 [18] validation_0-logloss:0.497605 [19] validation_0-logloss:0.495328 [20] validation_0-logloss:0.494777 [21] validation_0-logloss:0.494274 [22] validation_0-logloss:0.493333 [23] validation_0-logloss:0.492211 [24] validation_0-logloss:0.491936 [25] validation_0-logloss:0.490578 [26] validation_0-logloss:0.490895 [27] validation_0-logloss:0.490646 [28] validation_0-logloss:0.491911 [29] validation_0-logloss:0.491407 [30] validation_0-logloss:0.488828 [31] validation_0-logloss:0.487867 [32] validation_0-logloss:0.487297 [33] validation_0-logloss:0.487562 [34] validation_0-logloss:0.487788 [35] validation_0-logloss:0.487962 [36] validation_0-logloss:0.488218 [37] validation_0-logloss:0.489582 Stopping. Best iteration: [32] validation_0-logloss:0.487297 0.7755905511811023
另外我们还可以通过调用xgboost里面的plot_importance看看每个特征的重要性:
from xgboost import plot_importance
from matplotlib import pyplot
plot_importance(model)
pyplot.show()
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。