当前位置:   article > 正文

Python 构建 Random Forest 和 XGBoost_randomforestclassifier(random_state=1,criterion= "

randomforestclassifier(random_state=1,criterion= "log_loss",n_estimators=25)

核心函数


  1. def train_model(x_train, y_train, x_test, model_name, cv_state=True):
  2. '''
  3. Parameters
  4. ----------
  5. x_train : 训练集 x, np.array类型二维数组, [samples_train,features_train]
  6. y_train : 训练集 y np.array类型一维数组, [samples_train]
  7. x_test : 测试集 x np.array类型二维数组, [samples_test,features_test]
  8. model_name : 选用什么模型 :random forest / XGB
  9. cv_state: 是否需要模型因子挑选
  10. Returns
  11. -------
  12. y_predictoftest: 根据测试集 x 得到的预报测试集y
  13. '''
  14. if cv_state: #cv_state=True 则使用gridsearchCv 挑选最优参数
  15. if model_name == 'random forest':
  16. model = RandomForestRegressor(random_state=1, criterion='squared_error')
  17. # 挑选参数
  18. paremeters = [{"max_features": range(1, 32, 3),
  19. "min_samples_leaf": range(1, 20, 3),
  20. "max_depth": range(1, 20, 3)
  21. }]
  22. grid = GridSearchCV(model, paremeters, cv=10, scoring="neg_mean_squared_error",verbose=10)
  23. grid.fit(x_train, y_train)
  24. print('best_params_=', grid.best_params_)
  25. print('best_score_=', grid.best_score_)
  26. model = RandomForestRegressor(random_state=1, criterion='mse',
  27. max_features=grid.best_params_['max_features'],
  28. min_samples_leaf=grid.best_params_['min_samples_leaf'],
  29. max_depth=grid.best_params_['max_depth'])
  30. elif model_name == 'XGB':
  31. model = xgb.XGBRegressor(random_state=1)
  32. # 挑选参数
  33. parameters = [{"eta": [0.3, 0.2, 0.1],
  34. "max_depth": [3, 5, 6, 10, 20],
  35. "n_estimators": [100, 200, 500],
  36. 'gamma': [0, 0.1, 0.2, 0.5, 1]
  37. }]
  38. grid = GridSearchCV(model, parameters, cv=10, scoring="neg_mean_squared_error",verbose=10)
  39. grid.fit(x_train, y_train)
  40. print('best_params_=', grid.best_params_)
  41. print('best_score_=', grid.best_score_)
  42. model = xgb.XGBRegressor(random_state=1,
  43. eta=grid.best_params_['eta'],
  44. max_depth=grid.best_params_['max_depth'],
  45. n_estimators=grid.best_params_['n_estimators'],
  46. gamma=grid.best_params_['gamma'])
  47. else: #cv_state=False 则根据自己需要修改模型参数后直接推理
  48. if model_name == 'random forest':
  49. model = RandomForestRegressor(random_state=1, criterion='mse', max_depth=7, max_features=31,
  50. min_samples_leaf=10) # random_state=1,criterion='mse',max_depth=7,max_features=31,min_samples_leaf=10
  51. elif model_name == 'XGB':
  52. #model = xgb.XGBRegressor(random_state=1, learning_rate=0.1, max_depth=2, n_estimators=100)
  53. model = xgb.XGBRegressor(random_state=1, gamma=0.1, max_depth=3, n_estimators=100)
  54. regr = model.fit(x_train, y_train)
  55. y_predictoftest = regr.predict(x_test)
  56. return y_predictoftest

在调用train_model前你需要额外做的一些事情:

1. 加载必要的库:

  1. import numpy as np
  2. import pandas as pd
  3. from sklearn.model_selection import GridSearchCV
  4. from sklearn.ensemble import RandomForestRegressor
  5. import xgboost as xgb
  6. from sklearn.model_selection import StratifiedKFold

2. 根据自己的数据处理数据

x_train,y_train, x_test,y_test, 类型描述可看train_model中描述

3. 调用函数

y_predictoftest=train_model(x_train,y_train, x_test,'XGB',cv_state=True)

如果想调用XGB,就用'XGB';

如果想调用random forest,就用'random forest'

cv_state: 是否需要GridSearchCV进行调参

4. 如果想自己调参,可以参阅官网

scikit-learn 官网Random Forest 部分:

分类器:

sklearn.ensemble.RandomForestClassifier — scikit-learn 1.1.1 documentation

回归器:

sklearn.ensemble.RandomForestRegressor — scikit-learn 1.1.1 documentation

XGBoost :

XGBoost Parameters — xgboost 1.6.1 documentation

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/525433
推荐阅读
相关标签
  

闽ICP备14008679号