当前位置:   article > 正文

第32步 机器学习分类实战:SHAP_shap库安装

shap库安装

继续填坑,这回到SHAP,这个是选修,有兴趣可以看看。

我们建立了十个ML模型,如果选出了Xgboost、LightGBM、Catboost这种树模型(大概率也是这些最厉害了),那就可以用SHAP进行模型可视化。

(1)首先,使用pip install shap进行安装,记得是在Anconda Prompt敲入:

(2)然后,我们以Xgboost为例子,开整:

  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. import pandas as pd
  4. dataset = pd.read_csv('X disease code fs.csv')
  5. X = dataset.iloc[:, 1:14].values
  6. Y = dataset.iloc[:, 0].values
  7. from sklearn.model_selection import train_test_split
  8. X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.30, random_state = 666)
  9. from sklearn.preprocessing import StandardScaler
  10. sc = StandardScaler()
  11. X_train = sc.fit_transform(X_train)
  12. X_test = sc.transform(X_test)
  13. import xgboost as xgb
  14. param_grid=[{
  15.             'n_estimators':[35],
  16.             'eta':[0.1],
  17.             'max_depth':[1],
  18.             'gamma':[0],
  19.             'min_child_weight':[5],
  20.             'max_delta_step':[1],
  21.             'subsample':[0.8],
  22.             'colsample_bytree':[0.8],
  23.             'colsample_bylevel':[0.8],
  24.             'reg_lambda':[9],
  25.             'reg_alpha':[5],
  26.             },
  27.            ]
  28. boost = xgb.XGBClassifier()
  29. classifier = xgb.XGBClassifier()
  30. from sklearn.model_selection import GridSearchCV
  31. grid_search = GridSearchCV(boost, param_grid, n_jobs = -1, verbose = 2, cv=10)      
  32. grid_search.fit(X_train, y_train)    
  33. classifier = grid_search.best_estimator_  
  34. classifier.fit(X_train, y_train)
  35. y_pred = classifier.predict(X_test)
  36. y_testprba = classifier.predict_proba(X_test)[:,1]
  37. y_trainpred = classifier.predict(X_train)
  38. y_trainprba = classifier.predict_proba(X_train)[:,1]
  39. from sklearn.metrics import confusion_matrix
  40. cm_test = confusion_matrix(y_test, y_pred)
  41. cm_train = confusion_matrix(y_train, y_trainpred)
  42. print(cm_train)
  43. print(cm_test)
  44. #绘画SHAP相关图:使用前先安装SHAP:pip install shap
  45. import shap
  46. explainer = shap.TreeExplainer(classifier)
  47. shap.initjs()
  48. shap_values = explainer.shap_values(X_train)
  49. shap.summary_plot(shap_values, X_train)

输出如下:

这里的Feature 0 就是当初导入的第一个特征B,从左到右的顺序:

可以和Xgboost自带的重要指数相比较,大同小异:

 

具体理论和解释见以下网址,就不细说了:

SHAP知识点全汇总 - 知乎

https://www.kaggle.com/code/dansbecker/shap-values/tutorial

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/97520
推荐阅读
相关标签
  

闽ICP备14008679号