当前位置:   article > 正文

预处理--python实现用随机森林评估特征的重要性_python随机森林特征重要性

python随机森林特征重要性

python实现用随机森林评估特征的重要性

随机森林根据森林中所有决策树计算平均不纯度的减少来测量特征的重要性,而不作任何数据是线性可分或不可分的假设。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel


df_wine = pd.read_csv("xxx\\wine.data",
                      header=None)

df_wine.columns = ['Class label', 'Alcohol', 'Malic acid', 'Ash',
                   'Alcalinity of ash', 'Magnesium', 'Total phenols',
                   'Flavanoids', 'Nonflavanoid phenols', 'Proanthocyanins',
                   'Color intensity', 'Hue', 'OD280/OD315 of diluted wines',
                   'Proline']

# print(df_wine['Class label'])
# print('Class labels', np.unique(df_wine['Class label']))
# print(df_wine.head())

X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values

X_train, X_test, y_train, y_test = train_test_split(X, y,
                     test_size=0.3,
                     random_state=0,
                     stratify=y)

mms = MinMaxScaler()
X_train_norm = mms.fit_transform(X_train)
X_test_norm = mms.transform(X_test)

stdsc = StandardScaler()
X_train_std = stdsc.fit_transform(X_train)
X_test_std = stdsc.transform(X_test)


feat_labels = df_wine.columns[1:]

forest = RandomForestClassifier(n_estimators=500,
                                random_state=1)

forest.fit(X_train, y_train)
importances = forest.feature_importances_
print(importances)

indices = np.argsort(importances)[::-1]

for f in range(X_train.shape[1]):
    print("%2d) %-*s %f" % (f + 1, 60,
                            feat_labels[indices[f]],
                            importances[indices[f]]))

plt.title('Feature Importance')
plt.bar(range(X_train.shape[1]),
        importances[indices],
        align='center')

plt.xticks(range(X_train.shape[1]),
           feat_labels[indices], rotation=90)
plt.xlim([-1, X_train.shape[1]])
plt.tight_layout()
#plt.savefig('images/04_09.png', dpi=300)
plt.show()

# 为了总结特征的重要值和随机森林,值得一提的是scikit-learn也实现了Select-FromModel对象,可以在模型拟合后,根据用户指定的阈值选择特征
sfm = SelectFromModel(forest, threshold=0.1, prefit=True)  # prefit:预设模型是否期望直接传递给构造函数
X_selected = sfm.transform(X_train)
print('Number of features that meet this threshold criterion:',
      X_selected.shape[1])

for f in range(X_selected.shape[1]):
    print("%2d) %-*s %f" % (f + 1, 30,
                            feat_labels[indices[f]],
                            importances[indices[f]]))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77

运行结果:
[0.11852942 0.02564836 0.01327854 0.02236594 0.03135708 0.05087243
0.17475098 0.01335393 0.02556988 0.1439199 0.058739 0.13616194
0.1854526 ]

  1. Proline 0.185453
  2. Flavanoids 0.174751
  3. Color intensity 0.143920
  4. OD280/OD315 of diluted wines 0.136162
  5. Alcohol 0.118529
  6. Hue 0.058739
  7. Total phenols 0.050872
  8. Magnesium 0.031357
  9. Malic acid 0.025648
  10. Proanthocyanins 0.025570
  11. Alcalinity of ash 0.022366
  12. Nonflavanoid phenols 0.013354
  13. Ash 0.013279
    Number of features that meet this threshold criterion: 5
  14. Proline 0.185453
  15. Flavanoids 0.174751
  16. Color intensity 0.143920
  17. OD280/OD315 of diluted wines 0.136162
  18. Alcohol 0.118529

运行结果图:
把葡萄酒数据集中不同的特征按其相对重要性进行排序,请注意,特征重要性值被正常化所以总和为1
在这里插入图片描述

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/凡人多烦事01/article/detail/97630
推荐阅读
相关标签
  

闽ICP备14008679号