赞
踩
.Grid Search
.GridSearchCV
class sklearn.model_selection.GridSearchCV
穷举搜索指定的参数值的估计。
estimator
:estimator object
param_grid
:dict or list of dictionaries
scoring
:str, callable, list/tuple or dict, default=None
n_jobs
:int, default=None
pre_dispatch
:int, or str, default=n_jobs
iid
:bool, default=False
自版本0.22以来已弃用:参数iid在0.22中已弃用,将在0.24中删除
cv
:int, cross-validation generator or an iterable, default=None
refit
:bool, str, or callable, default=True
verbose
:integer
error_score
:‘raise’ or numeric, default=np.nan
return_train_score
:bool, default=False
cv_results_
:dict of numpy (masked) ndarrays
一种dict,以键作为列标题,以值作为列元素,可以导入到pandas DataFrame中。
键值“params”用于存储所有候选参数的参数设置列表。
mean_fit_time、std_fit_time、mean_score_time和std_score_time的单位都是秒。
best_estimator_
:estimator
best_score_
:float
best_params_
:dict
best_index_
:int
scorer_
:function or a dict
n_splits_
:int
decision_function(self, X)
fit(self, X[, y, groups])
get_params(self[, deep])
inverse_transform(self, Xt)
predict(self, X)
predict_log_proba(self, X)
predict_proba(self, X)
score(self, X[, y])
set_params(self, \*\*params)
-设置这个估计器的参数。
transform(self, X)
# 导入库 from sklearn import svm, datasets from sklearn.model_selection import GridSearchCV from sklearn.model_selection import train_test_split from sklearn import metrics # 数据集 iris = datasets.load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data,iris.target, test_size=0.3, random_state=42) # 训练模型 parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} svc = svm.SVC() clf = GridSearchCV(svc, parameters,cv=10, scoring="f1_micro") clf.fit(X_train, y_train) # 查看最佳分数、最佳参数 clf.best_score_ clf.best_params_ # 获取最佳模型 best_model=clf.best_estimator_ # 利用最佳模型进行预测 y_predict=best_model.predict(X_test) metrics.f1_score(y_test, y_predict,average='micro')
import numpy as np import pandas as pd from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model.logistic import LogisticRegression from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV from sklearn.metrics import roc_curve, auc import matplotlib.pyplot as plt import pandas as pd from sklearn.preprocessing import LabelEncoder from sklearn.pipeline import Pipeline from sklearn.metrics import precision_score, recall_score, accuracy_score pipeline = Pipeline([ ('vect', TfidfVectorizer(stop_words='english')), ('clf', LogisticRegression()) ]) parameters = { 'vect__max_df': (0.25, 0.5, 0.75), 'vect__stop_words': ('english', None), 'vect__max_features': (2500, 5000, None), 'vect__ngram_range': ((1, 1), (1, 2)), 'vect__use_idf': (True, False), 'clf__penalty': ('l1', 'l2'), 'clf__C': (0.01, 0.1, 1, 10), } df = pd.read_csv('./sms.csv') X = df['message'] y = df['label'] label_encoder = LabelEncoder() y = label_encoder.fit_transform(y) X_train, X_test, y_train, y_test = train_test_split(X, y) grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='accuracy', cv=3) grid_search.fit(X_train, y_train) print('Best score: %0.3f' % grid_search.best_score_) print('Best parameters set:') best_parameters = grid_search.best_estimator_.get_params() for param_name in sorted(parameters.keys()): print('\t%s: %r' % (param_name, best_parameters[param_name])) predictions = grid_search.predict(X_test) print('Accuracy: %s' % accuracy_score(y_test, predictions)) print('Precision: %s' % precision_score(y_test, predictions)) print('Recall: %s' % recall_score(y_test, predictions)) df = pd.read_csv('./sms.csv') X_train_raw, X_test_raw, y_train, y_test = train_test_split(df['message'], df['label'], random_state=11) vectorizer = TfidfVectorizer() X_train = vectorizer.fit_transform(X_train_raw) X_test = vectorizer.transform(X_test_raw) classifier = LogisticRegression() classifier.fit(X_train, y_train) scores = cross_val_score(classifier, X_train, y_train, cv=5) print('Accuracies: %s' % scores) print('Mean accuracy: %s' % np.mean(scores)) precisions = cross_val_score(classifier, X_train, y_train, cv=5, scoring='precision') print('Precision: %s' % np.mean(precisions)) recalls = cross_val_score(classifier, X_train, y_train, cv=5, scoring='recall') print('Recall: %s' % np.mean(recalls)) f1s = cross_val_score(classifier, X_train, y_train, cv=5, scoring='f1') print('F1 score: %s' % np.mean(f1s))
Python机器学习笔记:Grid SearchCV(网格搜索)
调参必备—GridSearch网格搜索
GridSearchCV官方文档
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。