赞
踩
随机搜索:
首先,为随机森林回归器定义超参数的随机网格。
然后,您可以使用基于指定的随机网格搜索最佳超参数。RandomizedSearchCV
从随机搜索中提取最佳估计量 ()。best_random
随机搜索模型的评估:
在测试集上评估基本随机森林回归器 () 的性能。base_model
接下来,通过测试集上的随机搜索 () 评估最佳估计器的性能。best_random
网格搜索(第一轮):
您可以定义一个新的、更具体的参数网格 () 以进行进一步的超参数优化。param_grid
用于对指定的参数网格执行详尽搜索。GridSearchCV
从此网格搜索中提取最佳估计器 ()。best_grid
第一网格搜索模型的评估:
从测试集上的第一个网格搜索 () 中评估最佳估计器的性能。best_grid
网格搜索(第二轮):
为第二轮超参数优化定义另一个参数网格 ()。param_grid
您可以再次使用,以根据新参数网格搜索最佳超参数。GridSearchCV
从第二个网格搜索中提取最佳估计器 ()。best_grid_2
第二网格搜索模型的评估:
您可以在测试集上通过第二个网格搜索() 评估最佳估计器的性能。best_grid_2
其他网格搜索:
最后,使用其他超参数执行另一个网格搜索 ()。grid_search_ad
从此附加网格搜索中提取最佳估计器 ()。best_grid_ad
您可以在测试集上通过此附加网格搜索来评估最佳估计器的性能。
打印最终模型参数:
使用 打印最佳模型 () 的最后一组超参数。
import pandas as pd
features= pd.read_csv('data/temps_extended.csv')
features = pd.get_dummies(features) labels = features['actual'] features = features.drop('actual', axis = 1) feature_list = list(features.columns) import numpy as np features = np.array(features) labels = np.array(labels) from sklearn.model_selection import train_test_split train_features, test_features, train_labels, test_labels = train_test_split(features, labels, test_size = 0.25, random_state = 42) # print('Training Features Shape:', train_features.shape) # print('Training Labels Shape:', train_labels.shape) # print('Testing Features Shape:', test_features.shape) # print('Testing Labels Shape:', test_labels.shape) # # print('{:0.1f} years of data in the training set'.format(train_features.shape[0] / 365.)) # print('{:0.1f} years of data in the test set'.format(test_features.shape[0] / 365.))
important_feature_names = ['temp_1', 'average', 'ws_1', 'temp_2', 'friend', 'year']
important_indices = [feature_list.index(feature) for feature in important_feature_names]
important_train_features = train_features[:, important_indices]
important_test_features = test_features[:, important_indices]
# print('Important train features shape:', important_train_features.shape)
# print('Important test features shape:', important_test_features.shape)
train_features = important_train_features[:]
test_features = important_test_features[:]
feature_list = important_feature_names[:]
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(random_state = 42)
from pprint import pprint
# 打印所有参数
pprint(rf.get_params())
from sklearn.model_selection import RandomizedSearchCV # 建立树的个数 n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)] # 最大特征的选择方式 max_features = ['sqrt'] # 树的最大深度 max_depth = [int(x) for x in np.linspace(10, 20, num = 2)] max_depth.append(None) # 节点最小分裂所需样本个数 min_samples_split = [2, 5, 10] # 叶子节点最小样本数,任何分裂不能让其子节点样本数少于此值 min_samples_leaf = [1, 2, 4] # 样本采样方法 bootstrap = [True, False] # Random grid随机网格 random_grid = {'n_estimators': n_estimators, 'max_features': max_features, 'max_depth': max_depth, 'min_samples_split': min_samples_split, 'min_samples_leaf': min_samples_leaf, 'bootstrap': bootstrap}
rf = RandomForestRegressor()
rf_random = RandomizedSearchCV(estimator=rf, param_distributions=random_grid,
n_iter = 10, scoring='neg_mean_absolute_error',
cv = 3, verbose=2, random_state=42, n_jobs=-1)
# 执行寻找操作
rf_random.fit(train_features, train_labels)
rf_random.best_params_ # 评估函数 def evaluate(model, test_features, test_labels): predictions = model.predict(test_features) errors = abs(predictions - test_labels) mape = 100 * np.mean(errors / test_labels) accuracy = 100 - mape print('平均气温误差.',np.mean(errors)) print('Accuracy = {:0.2f}%.'.format(accuracy)) # 看看效果 与老模型对比 base_model=RandomForestRegressor(random_state=42) base_model.fit(train_features,train_labels) evaluate(base_model,test_features,test_labels) # 新配方(最好的参数) best_random=rf_random.best_estimator_ evaluate(best_random,test_features,test_labels)
# from sklearn.model_selection import GridSearchCV # 网络搜索 param_grid = { 'bootstrap': [True], 'max_depth': [8,10,12], 'max_features': ['sqrt'], 'min_samples_leaf': [2,3, 4, 5,6], 'min_samples_split': [3, 5, 7], 'n_estimators': [800, 900, 1000, 1200] } # 选择基本算法模型 rf = RandomForestRegressor() # 网络搜索 grid_search = GridSearchCV(estimator = rf, param_grid = param_grid, scoring = 'neg_mean_absolute_error', cv = 3, n_jobs = -1, verbose = 2) # 执行搜索 grid_search.fit(train_features,train_labels) grid_search.best_params_ best_grid = grid_search.best_estimator_ evaluate(best_grid, test_features, test_labels)
param_grid = { 'bootstrap': [True], 'max_depth': [12, 15, None], 'max_features': [3, 4], 'min_samples_leaf': [5, 6, 7], 'min_samples_split': [7,10,13], 'n_estimators': [900, 1000, 1200] } # 选择算法模型 rf = RandomForestRegressor() # 继续寻找 grid_search_ad = GridSearchCV(estimator = rf, param_grid = param_grid, scoring = 'neg_mean_absolute_error', cv = 3, n_jobs = -1, verbose = 2) grid_search_ad.fit(train_features, train_labels) grid_search_ad.best_params_ best_grid_ad = grid_search_ad.best_estimator_ evaluate(best_grid_ad, test_features, test_labels) ## 最终模型 print('最终模型参数:\n') pprint(best_grid_ad.get_params())
最终模型参数: {'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'squared_error', 'max_depth': None, 'max_features': 4, 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 7, 'min_samples_split': 13, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 900, 'n_jobs': None, 'oob_score': False, 'random_state': None, 'verbose': 0, 'warm_start': False}
**# bootstrap:此参数确定在构建树时是否使用 bootstrap 示例。设置为True 时,它将启用引导。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。