当前位置:   article > 正文

随机森林预测、重要性分析(Python实现)_随机森林预测环境因子重要性

随机森林预测环境因子重要性

  1. from sklearn.model_selection import train_test_split
  2. from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
  3. from functools import reduce
  4. import numpy as np
  5. import pandas as pd
  6. # 数据导入及基本信息定义
  7. data = pd.read_excel('data2(Topsis评分评级).xlsx')
  8. data = data.drop(columns=['ID'])
  9. prediction_set = data[data['MM'].isna()]
  10. training_set = data.dropna()
  11. features = ['XXX','XXX'] # 此处放表格列名
  12. X = training_set[features]
  13. y = training_set['MM']
  14. minrmse = 1000
  15. maxscore = 0
  16. # 选取最优的random_state
  17. '''
  18. for randomstate in range(50, -1, -1):
  19. X_train, X_test, y_train, y_test = train_test_split(
  20. X, y, test_size=0.2, random_state=randomstate)
  21. model = RandomForestClassifier(random_state=randomstate)
  22. model.fit(X_train, y_train)
  23. y_pred = model.predict(X_test)
  24. score = accuracy_score(y_test, y_pred)
  25. rmse = np.sqrt(mean_squared_error(y_test, y_pred))
  26. if score > maxscore:
  27. maxscore = score
  28. print(maxscore, randomstate)
  29. X_pred = prediction_set[features]
  30. y_pred_prediction_set = model.predict(X_pred)
  31. '''
  32. # 第一种方法:随机森林分类器
  33. X_train, X_test, y_train, y_test = train_test_split(X,
  34. y,
  35. test_size=0.2,
  36. random_state=27)
  37. model = RandomForestClassifier(random_state=27)
  38. model.fit(X_train, y_train)
  39. y_pred = model.predict(X_test)
  40. X_pred = prediction_set[features]
  41. y_pred_prediction_set = model.predict(X_pred)
  42. y_pred_prediction_set = pd.DataFrame(y_pred_prediction_set)
  43. y_pred_prediction_set.to_excel('Topsis评级预测-后20.xlsx')
  44. # 第二种方法:随机森林回归器
  45. X_train, X_test, y_train, y_test = train_test_split(X,
  46. y,
  47. test_size=0.2,
  48. random_state=27)
  49. model = RandomForestClassifier(random_state=27)
  50. model.fit(X_train, y_train)
  51. y_pred = model.predict(X_test)
  52. X_pred = prediction_set[features]
  53. y_pred_prediction_set = model.predict(X_pred)
  54. y_pred_prediction_set = pd.DataFrame(y_pred_prediction_set)
  55. y_pred_prediction_set.to_excel('Topsis评级预测-后20.xlsx')
  56. # 重要性分析
  57. importances = model.feature_importances_
  58. importances_df = pd.DataFrame({
  59. 'Feature': features,
  60. 'Importance_behavior': importances
  61. })
  62. dfs = [importances]
  63. df = reduce(lambda x, y: pd.merge(x, y, on="Feature", how="outer"), dfs)
  64. df = pd.DataFrame(df)
  65. df.to_excel('TopsisImportances-后20(8个特征重要性分析).xlsx')

要注意的是,如果用分类器,y的取值需要是离散数值

如果用回归器,不要求是离散数据,但需要是数值
所以两种方法都要对目标列先进行数值化处理

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Cpp五条/article/detail/97645
推荐阅读
相关标签
  

闽ICP备14008679号