赞
踩
StratifiedKFold——执行分层采样
sklearn.model_selection.StratifiedKFold(n_splits=,random_state=,shuffle=)
y:样本集标记序列
n:整数,数据集大小
n_flods:整数k,大于等于2
shuffle:布尔值,是否混洗数据
random_state整数——随机数种子,否则为随机数生成器split(X[,y,groups])
X:训练数据集(n_samples,n_features)
y:标记信息(n_samples,)
划分数据集为训练集、测试集
- X=np.array([[1,2,3,4],
- [11,12,13,14],
- [21,22,23,24],
- [31,32,33,34],
- [41,42,43,44],
- [51,52,53,54],
- [61,62,63,64],
- [71,72,73,74]])
-
- y=np.array([1,1,0,0,1,1,0,0])
-
- # 普通交叉切分
- folder=KFold(n_splits=4,shuffle=False)
- for train_index,test_index in folder.split(X,y):
- print("Train Index:",train_index)
- print("Test Index:",test_index)
- print("y_train:",y[train_index])
- print("y_test:",y[test_index])
- print("")
-
- # 分层采样交叉切分
- stratified_folder=StratifiedKFold(n_splits=4,shuffle=False)
- for train_index,test_index in stratified_folder.split(X,y):
- print("Stratified Train Index:",train_index)
- print("Stratified Test Index:",test_index)
- print("Stratified y_train:",y[train_index])
- print("Stratified y_test:",y[test_index])
- print("")
【out】:
普通交叉切分:
Train Index: [2 3 4 5 6 7]
Test Index: [0 1]
y_train: [0 0 1 1 0 0]
y_test: [1 1]普通交叉切分:
Train Index: [0 1 4 5 6 7]
Test Index: [2 3]
y_train: [1 1 1 1 0 0]
y_test: [0 0]普通交叉切分:
Train Index: [0 1 2 3 6 7]
Test Index: [4 5]
y_train: [1 1 0 0 0 0]
y_test: [1 1]普通交叉切分:
Train Index: [0 1 2 3 4 5]
Test Index: [6 7]
y_train: [1 1 0 0 1 1]
y_test: [0 0]分层采样交叉切分:
Stratified Train Index: [1 3 4 5 6 7]
Stratified Test Index: [0 2]
Stratified y_train: [1 0 1 1 0 0]
Stratified y_test: [1 0]分层采样交叉切分:
Stratified Train Index: [0 2 4 5 6 7]
Stratified Test Index: [1 3]
Stratified y_train: [1 0 1 1 0 0]
Stratified y_test: [1 0]分层采样交叉切分:
Stratified Train Index: [0 1 2 3 5 7]
Stratified Test Index: [4 6]
Stratified y_train: [1 1 0 0 1 0]
Stratified y_test: [1 0]分层采样交叉切分:
Stratified Train Index: [0 1 2 3 4 6]
Stratified Test Index: [5 7]
Stratified y_train: [1 1 0 0 1 0]
Stratified y_test: [1 0]
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。