赞
踩
Python之shap:深度剖析shap.datasets.adult()源码中的X,y和X_display,y_display输出数区别
目录
深度剖析shap.datasets.adult()源码中的X,y和X_display,y_display
- X,y = shap.datasets.adult()
- X_display,y_display = shap.datasets.adult(display=True)
- def adult(display=False):
- """ Return the Adult census data in a nice package. """
- dtypes = [
- ("Age", "float32"), ("Workclass", "category"), ("fnlwgt", "float32"),
- ("Education", "category"), ("Education-Num", "float32"), ("Marital Status", "category"),
- ("Occupation", "category"), ("Relationship", "category"), ("Race", "category"),
- ("Sex", "category"), ("Capital Gain", "float32"), ("Capital Loss", "float32"),
- ("Hours per week", "float32"), ("Country", "category"), ("Target", "category")
- ]
- raw_data = pd.read_csv(
- cache(github_data_url + "adult.data"),
- names=[d[0] for d in dtypes],
- na_values="?",
- dtype=dict(dtypes)
- )
- data = raw_data.drop(["Education"], axis=1) # redundant with Education-Num
- filt_dtypes = list(filter(lambda x: not (x[0] in ["Target", "Education"]), dtypes))
- data["Target"] = data["Target"] == " >50K"
- rcode = {
- "Not-in-family": 0,
- "Unmarried": 1,
- "Other-relative": 2,
- "Own-child": 3,
- "Husband": 4,
- "Wife": 5
- }
- for k, dtype in filt_dtypes:
- if dtype == "category":
- if k == "Relationship":
- data[k] = np.array([rcode[v.strip()] for v in data[k]])
- else:
- data[k] = data[k].cat.codes
-
- if display:
- return raw_data.drop(["Education", "Target", "fnlwgt"], axis=1), data["Target"].values
- return data.drop(["Target", "fnlwgt"], axis=1), data["Target"].values
结论:
data:是基于raw_data读入的csv文件数据,为新定义的新数据,共计drop了3列(第1个红色矩形框),又进行了目标特征的二分类(第2个红色矩形框),最后进行了类别特征进行了数值化/编码化(第3个红色矩形框);经过处理后的数据均为数字列且目标特征为二分类的dataframe。
raw_data:为原始数据,从csv读入,仅经过drop了3列,其余原封不同输出数据。
- (32561, 12) X.shape
- age workclass ... hours-per-week native-country
- 0 39 State-gov ... 40 United-States
- 1 50 Self-emp-not-inc ... 13 United-States
- 2 38 Private ... 40 United-States
- 3 53 Private ... 40 United-States
- 4 28 Private ... 40 Cuba
- ... ... ... ... ... ...
- 32556 27 Private ... 38 United-States
- 32557 40 Private ... 40 United-States
- 32558 58 Private ... 40 United-States
- 32559 22 Private ... 20 United-States
- 32560 52 Self-emp-inc ... 40 United-States
-
- [32561 rows x 12 columns]
age | workclass | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | |
0 | 39 | State-gov | 13 | Never-married | Adm-clerical | Not-in-family | White | Male | 2174 | 0 | 40 | United-States |
1 | 50 | Self-emp-not-inc | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 13 | United-States |
2 | 38 | Private | 9 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 0 | 0 | 40 | United-States |
3 | 53 | Private | 7 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 0 | 0 | 40 | United-States |
4 | 28 | Private | 13 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 0 | 0 | 40 | Cuba |
5 | 37 | Private | 14 | Married-civ-spouse | Exec-managerial | Wife | White | Female | 0 | 0 | 40 | United-States |
6 | 49 | Private | 5 | Married-spouse-absent | Other-service | Not-in-family | Black | Female | 0 | 0 | 16 | Jamaica |
7 | 52 | Self-emp-not-inc | 9 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 45 | United-States |
8 | 31 | Private | 14 | Never-married | Prof-specialty | Not-in-family | White | Female | 14084 | 0 | 50 | United-States |
9 | 42 | Private | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 5178 | 0 | 40 | United-States |
- (32561, 12) X_display.shape
- age workclass ... hours-per-week native-country
- 0 39 State-gov ... 40 United-States
- 1 50 Self-emp-not-inc ... 13 United-States
- 2 38 Private ... 40 United-States
- 3 53 Private ... 40 United-States
- 4 28 Private ... 40 Cuba
- ... ... ... ... ... ...
- 32556 27 Private ... 38 United-States
- 32557 40 Private ... 40 United-States
- 32558 58 Private ... 40 United-States
- 32559 22 Private ... 20 United-States
- 32560 52 Self-emp-inc ... 40 United-States
-
- [32561 rows x 12 columns]
age | workclass | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | |
0 | 39 | State-gov | 13 | Never-married | Adm-clerical | Not-in-family | White | Male | 2174 | 0 | 40 | United-States |
1 | 50 | Self-emp-not-inc | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 13 | United-States |
2 | 38 | Private | 9 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 0 | 0 | 40 | United-States |
3 | 53 | Private | 7 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 0 | 0 | 40 | United-States |
4 | 28 | Private | 13 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 0 | 0 | 40 | Cuba |
5 | 37 | Private | 14 | Married-civ-spouse | Exec-managerial | Wife | White | Female | 0 | 0 | 40 | United-States |
6 | 49 | Private | 5 | Married-spouse-absent | Other-service | Not-in-family | Black | Female | 0 | 0 | 16 | Jamaica |
7 | 52 | Self-emp-not-inc | 9 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 45 | United-States |
8 | 31 | Private | 14 | Never-married | Prof-specialty | Not-in-family | White | Female | 14084 | 0 | 50 | United-States |
9 | 42 | Private | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 5178 | 0 | 40 | United-States |
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。