XGBoost与Lightgbm_lppz5.csv

作者：花生_TL007 | 2024-02-16 22:42:12

踩

lppz5.csv

本文主要参考自以下网站
https://cloud.tencent.com/developer/article/1389899
https://cloud.tencent.com/developer/article/1052678
https://cloud.tencent.com/developer/article/1052664

XGBoost
1、重要参数详解
booster[default=gbtree]： gbtree, gblinear
nthread: 线程数
eta[default=0.3]: 收缩步长，防止过拟合
max_depth[default=6]: 树的最大深度
min_child_weight: 孩子节点中最小的样本权重和
subsample[default=1]: 用于训练模型的子样本占整个样本集合的比例
lambda[default=0]:　L2正则的惩罚系数
alpha [default=0] ： L1 正则的惩罚系数
objective [ default=reg:linear ] ：定义学习任务及相应的学习目标
可选的目标函数如下：
“reg:linear” —— 线性回归。
“reg:logistic”—— 逻辑回归。
“binary:logistic”—— 二分类的逻辑回归问题，输出为概率。
“binary:logitraw”—— 二分类的逻辑回归问题，输出的结果为wTx。
“count:poisson”—— 计数问题的poisson回归，输出结果为poisson分布。在poisson回归中，max_delta_step的缺省值为0.7。
“multi:softmax” –让XGBoost采用softmax目标函数处理多分类问题，同时需要设置参数num_class（类别个数）
“multi:softprob” –和softmax一样，但是输出的是ndata * nclass的向量，可以将该向量reshape成ndata行nclass列的矩阵。没行数据表示样本所属于每个类别的概率。
“rank:pairwise” –set XGBoost to do ranking task by minimizing the pairwise loss
eval_metric [ default according to objective ]：校验数据所需要的评价标准
“rmse”: root mean square error
“logloss”: negative log-likelihood
“error”: Binary classification error rate
“merror”: Multiclass classification error rate.
“mlogloss”: Multiclass logloss.
“auc”: Area under the curve for ranking evaluation.
“ndcg”:Normalized Discounted Cumulative Gain
“map”:Mean average precision

2、具体操作
a、加载数据
libsvm 格式的文本数据；
Numpy 的二维数组；
XGBoost 的二进制的缓存文件。加载的数据存储在对象 DMatrix 中。
train = xgb.DMatrix(‘train.txt’)

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/花生_TL007/article/detail/97386