赞
踩
首先KFold函数,其实就是k折交叉验证。k = n_splits为多少,则所得测试集验证集就有几组。k也就是按照标签把数据集几等分。
import numpy as np
from sklearn.model_selection import KFold,StratifiedKFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4],[5,9],[1,5],[3,9],[5,8],[1,1],[1,4]])
y = np.array([0, 1, 1, 1, 0, 0, 1, 0, 0, 0])
kf = KFold(n_splits=2 ,random_state=2020)
#做split时只需传入数据,不需要传入标签
for train_index, test_index in kf.split(X):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
StratifiedKFold函数
同KFold,它的n_splits也是按照标签集均分,此处注意他是类似于分层抽样,是把每个标签都抽出来相等的个数。比如抽了一个1之后,开始抽0。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。