当前位置:   article > 正文

RandomizedSearchCV和GridSearchCV,在调用fit方法的时候产生'list' object has no attribute 'values'错误之处理方法

list' object has no attribute 'values

【pyhon 版本 3.5.0 skit-learn版本<0.18.1>】

昨天发现的问题,RandomizedSearchCV怎么都调不通:

  1. # Split the dataset in two equal parts
  2. X_train, X_test, y_train, y_test = train_test_split(
  3. data,label, test_size=0.25, random_state=0)
  4. # Set the parameters by cross-validation
  5. tuned_parameters = [{'n_neighbors': range(2,7)},
  6. {'leaf_size':range(9,100,3)},
  7. {'p':range(1,5)}]
  8. svr=KNeighborsClassifier()
  9. scores = ['precision', 'recall']
  10. for score in scores:
  11. print("# Tuning hyper-parameters for %s" % score)
  12. print()
  13. labels=y_train.values
  14. aa
  15. c, r = labels.shape
  16. labels = labels.reshape(c,)
  17. clf = RandomizedSearchCV(svr, tuned_parameters,cv=5,n_jobs=-1,verbose=3)
  18. # clf = GridSearchCV(svr, tuned_parameters,cv=5,n_jobs=-1,verbose=3)
  19. clf.fit(X_train, labels)


报错如下:

 

  1. File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
  2. exec(compile(f.read(), filename, 'exec'), namespace)
  3. File "C:/Users/gzhuangzhongyi/Desktop/NetEase/test/RandomSearchCV_Functional.py", line 46, in <module>
  4. clf.fit(X_train, labels)
  5. File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 1190, in fit
  6. return self._fit(X, y, groups, sampled_params)
  7. File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 564, in _fit
  8. for parameters in parameter_iterable
  9. File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 758, in __call__
  10. while self.dispatch_one_batch(iterator):
  11. File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 603, in dispatch_one_batch
  12. tasks = BatchedCalls(itertools.islice(iterator, batch_size))
  13. File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 127, in __init__
  14. self.items = list(iterator_slice)
  15. File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 557, in <genexpr>
  16. )(delayed(_fit_and_score)(clone(base_estimator), X, y, self.scorer_,
  17. File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 230, in __iter__
  18. for v in self.param_distributions.values()])
  19. AttributeError: 'list' object has no attribute 'values'

 

经过查看fit方法,发现无论如何调整fit方法的参数,都没法运行。

但是如果换成GridSearchCV就可以运行。

经过查看类实现,发现两种类调用了相同的,fit方法,但是,fit方法有隐含传入的参数:

   

  1. sampled_params = ParameterSampler(self.param_distributions,
  2. self.n_iter,
  3. random_state=self.random_state)
  4. return self._fit(X, y, groups, sampled_params)

其中,sampled_params为传入参数之采样。

其传入参数在初始化的时候传入:

 

clf = RandomizedSearchCV(svr, tuned_parameters,cv=5,n_jobs=-1,verbose=3)

而,这个参数由:

  1. tuned_parameters = [{'n_neighbors': range(2,7)},
  2. {'leaf_size':range(9,100,3)},
  3. {'p':range(1,5)}]

语句设定,这里有三个字典。而正确的是:

 

  1. tuned_parameters = [{'n_neighbors': range(2,7),
  2. 'leaf_size':range(9,100,3),
  3. 'p':range(1,5)}]


Grid的时候会遍历字典中所有参数的组合,所以字典的划分不重要。

  1. for p in self.param_grid:
  2. # Always sort the keys of a dictionary, for reproducibility
  3. items = sorted(p.items())
  4. if not items:
  5. yield {}
  6. else:
  7. keys, values = zip(*items)
  8. for v in product(*values):
  9. params = dict(zip(keys, v))
  10. yield params

但是Randomlize,当传入字典的时候,会作为带分布的进行处理,对字典取值

  1. # Always sort the keys of a dictionary, for reproducibility
  2. items = sorted(self.param_distributions.items())
  3. for _ in six.moves.range(self.n_iter):
  4. params = dict()
  5. for k, v in items:
  6. if hasattr(v, "rvs"):
  7. if sp_version < (0, 16):
  8. params[k] = v.rvs()
  9. else:
  10. params[k] = v.rvs(random_state=rnd)
  11. else:
  12. params[k] = v[rnd.randint(len(v))]
  13. yield params


Random会检查传入的参数,如果可以遍历就认为是分布。

于是传入作为fit的参数集的时候,不是作为可遍历的对象的字典,可以.values,而是一个一个把分布元素组合成字典的list,但因为传入的不是一个分布而是一个list,所以不能对分布取值。


上面的两段函数GridSearchCV产生的参数集:


RandomizeSearchCV产生的参数集因为debug调不出来,无法展示。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/羊村懒王/article/detail/285413
推荐阅读
相关标签
  

闽ICP备14008679号