赞
踩
ML之LightGBM:通过数据预处理(分布图热图/特征分箱/标签编码)利用LightGBM实现银行客户是否购买产品二分类预测(交叉训练/AUC曲线可视化/Shap模型可解释)之详细攻略
目录
# 2.1、分布可视化:校验是否同分布,训练集与测试集数据分布
相关文章
ML之LightGBM:通过数据预处理(分布图热图/特征分箱/标签编码)利用LightGBM实现银行客户是否购买产品二分类预测(交叉训练/AUC曲线可视化/Shap模型可解释)之详细攻略
ML之LightGBM:通过数据预处理(分布图热图/特征分箱/标签编码)利用LightGBM实现银行客户是否购买产品二分类预测(交叉训练/AUC曲线可视化/Shap模型可解释)实现代码
赛题以银行产品认购预测为背景,想让你来预测下客户是否会购买银行的产品。在和客户沟通的过程中,我们记录了和客户联系的次数,上一次联系的时长,上一次联系的时间间隔,同时在银行系统中我们保存了客户的基本信息,包括:年龄、职业、婚姻、之前是否有违约、是否有房贷等信息,此外我们还统计了当前市场的情况:就业、消费信息、银行同业拆解率等。
用户购买预测是数字化营销领域中的重要应用场景,通过这道赛题,鼓励学习者利用营销活动信息,为企业提供销售策略,也为消费者提供更适合的商品推荐。
字段 | 说明 |
age | 年龄 |
job | 职业:admin, unknown, unemployed, management… |
marital | 婚姻:married, divorced, single |
default | 信用卡是否有违约: yes or no |
housing | 是否有房贷: yes or no |
contact | 联系方式:unknown, telephone, cellular |
month | 上一次联系的月份:jan, feb, mar, … |
day_of_week | 上一次联系的星期几:mon, tue, wed, thu, fri |
duration | 上一次联系的时长(秒) |
campaign | 活动期间联系客户的次数 |
pdays | 上一次与客户联系后的间隔天数 |
previous | 在本次营销活动前,与客户联系的次数 |
poutcome | 之前营销活动的结果:unknown, other, failure, success |
emp_var_rate | 就业变动率(季度指标) |
cons_price_index | 消费者价格指数(月度指标) |
cons_conf_index | 消费者信心指数(月度指标) |
lending_rate3m | 银行同业拆借率 3个月利率(每日指标) |
nr_employed | 雇员人数(季度指标) |
subscribe | 客户是否进行购买:yes 或 no |
id | age | job | marital | education | default | housing | loan | contact | month | day_of_week | duration | campaign | pdays | previous | poutcome | emp_var_rate | cons_price_index | cons_conf_index | lending_rate3m | nr_employed | subscribe |
1 | 51 | admin. | divorced | professional.course | no | yes | yes | cellular | aug | mon | 4621 | 1 | 112 | 2 | failure | 1.4 | 90.81 | -35.53 | 0.69 | 5219.74 | no |
2 | 50 | services | married | high.school | unknown | yes | no | cellular | may | mon | 4715 | 1 | 412 | 2 | nonexistent | -1.8 | 96.33 | -40.58 | 4.05 | 4974.79 | yes |
3 | 48 | blue-collar | divorced | basic.9y | no | no | no | cellular | apr | wed | 171 | 0 | 1027 | 1 | failure | -1.8 | 96.33 | -44.74 | 1.5 | 5022.61 | no |
4 | 26 | entrepreneur | single | high.school | yes | yes | yes | cellular | aug | fri | 359 | 26 | 998 | 0 | nonexistent | 1.4 | 97.08 | -35.55 | 5.11 | 5222.87 | yes |
5 | 45 | admin. | single | university.degree | no | no | no | cellular | nov | tue | 3178 | 1 | 240 | 4 | success | -3.4 | 89.82 | -33.83 | 1.17 | 4884.7 | no |
count | mean | std | min | 25% | 50% | 75% | max | |
id | 22500 | 11250.5 | 6495.334864 | 1 | 5625.75 | 11250.5 | 16875.25 | 22500 |
age | 22500 | 40.40751111 | 12.08607758 | 16 | 32 | 38 | 47 | 101 |
duration | 22500 | 1146.303733 | 1432.432125 | 0 | 143 | 353 | 1873 | 5149 |
campaign | 22500 | 3.3648 | 7.223836793 | 0 | 1 | 1 | 3 | 57 |
pdays | 22500 | 773.9919556 | 326.9343344 | 0 | 557.75 | 964 | 1005 | 1048 |
previous | 22500 | 1.316444444 | 1.918733345 | 0 | 0 | 0 | 2 | 6 |
emp_var_rate | 22500 | 0.078528889 | 1.573831196 | -3.4 | -1.8 | 1.1 | 1.4 | 1.4 |
cons_price_index | 22500 | 93.54878533 | 2.805786273 | 87.64 | 91.19 | 93.54 | 95.92 | 99.46 |
cons_conf_index | 22500 | -39.87718044 | 5.805441863 | -53.28 | -44.16 | -40.6 | -35.7975 | -25.55 |
lending_rate3m | 22500 | 3.302490222 | 1.611777222 | 0.6 | 1.43 | 3.92 | 4.83 | 5.27 |
nr_employed | 22500 | 5137.211285 | 170.6706111 | 4715.42 | 5008.51 | 5133.955 | 5267.6775 | 5489.5 |
- no 19548
- yes 2952
- Name: subscribe, dtype: int64
- Nu_features 11 ['id', 'age', 'duration', 'campaign', 'pdays', 'previous', 'emp_var_rate', 'cons_price_index', 'cons_conf_index', 'lending_rate3m', 'nr_employed']
- Ca_features 11 ['job', 'marital', 'education', 'default', 'housing', 'loan', 'contact', 'month', 'day_of_week', 'poutcome', 'subscribe']
- job : 12 {'unemployed', 'unknown', 'self-employed', 'student', 'retired', 'technician', 'housemaid', 'services', 'entrepreneur', 'management', 'blue-collar', 'admin.'}
- marital : 4 {'single', 'divorced', 'unknown', 'married'}
- education : 8 {'unknown', 'illiterate', 'professional.course', 'basic.4y', 'basic.6y', 'high.school', 'basic.9y', 'university.degree'}
- default : 3 {'unknown', 'no', 'yes'}
- housing : 3 {'unknown', 'yes', 'no'}
- loan : 3 {'unknown', 'yes', 'no'}
- contact : 2 {'telephone', 'cellular'}
- month : 10 {'jun', 'jul', 'dec', 'oct', 'sep', 'may', 'nov', 'apr', 'aug', 'mar'}
- day_of_week : 5 {'fri', 'thu', 'wed', 'mon', 'tue'}
- poutcome : 3 {'success', 'failure', 'nonexistent'}
- subscribe : 2 {'yes', 'no'}
mean 验证集auc:0.8925944143873619
mean 验证集auc:0.8823787365621856
结果分析,经过分箱后,AUC明细下降,故不建议分箱处理当前字段!
- Data columns (total 20 columns):
- # Column Non-Null Count Dtype
- --- ------ -------------- -----
- 0 age 30000 non-null int64
- 1 job 30000 non-null int32
- 2 marital 30000 non-null int32
- 3 education 30000 non-null int32
- 4 default 30000 non-null int32
- 5 housing 30000 non-null int32
- 6 loan 30000 non-null int32
- 7 contact 30000 non-null int32
- 8 month 30000 non-null int32
- 9 day_of_week 30000 non-null int32
- 10 duration 30000 non-null int64
- 11 campaign 30000 non-null int64
- 12 pdays 30000 non-null int64
- 13 previous 30000 non-null int64
- 14 poutcome 30000 non-null int32
- 15 emp_var_rate 30000 non-null float64
- 16 cons_price_index 30000 non-null float64
- 17 cons_conf_index 30000 non-null float64
- 18 lending_rate3m 30000 non-null float64
- 19 nr_employed 30000 non-null float64
- dtypes: float64(5), int32(10), int64(5)
- memory usage: 3.7 MB
- None
- train_test_split_Index: 22500
- 验证集AUC:0.8883930791281923
- 验证集AUC:0.8894560893079624
- 验证集AUC:0.8978416119041924
- 验证集AUC:0.8998423076923078
- 验证集AUC:0.8874389839041544
- mean 验证集auc:0.8925944143873619
- subscribe subscribe_cat
- 0 0.051004 no
- 1 0.106801 no
- 2 0.009075 no
- 3 0.031288 no
- 4 0.047608 no
- ... ... ...
- 7495 0.219465 no
- 7496 0.055063 no
- 7497 0.113493 no
- 7498 0.006213 no
- 7499 0.055546 no
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。