赞
踩
train_test_split是用得最多的数据集划分包,它的参数有五个:
*arrays:要切分的数据集,通过传入两个,X数据集和目标y
test_size:测试集样本大小
random_state:随机种子数
shuffle:是否要对数据集随机打乱
stratify:可以理解为分层抽样的设置值,通过针对分类问题的目标y
from sklearn.model_selection import train_test_split
from sklearn import datasets
boston = datasets.load_boston()
X = boston.data
y = boston.target
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3)
#x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3,stratify=y) #按y比例分层抽样,通过用于分类问题
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。