赞
踩
import torch
import torch.utils.data as Data
if __name__ == '__main__':
torch.manual_seed(1) # reproducible
BATCH_SIZE = 5 # 批训练的数据个数
x = torch.linspace(11, 20, 10) # x data: tensor([11., 12., 13., 14., 15., 16., 17., 18., 19., 20.])
y = torch.linspace(1, 10, 10) # y data: tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
# print(x)
# print(y)
# 先转换成 torch 能识别的 Dataset
torch_dataset = Data.TensorDataset(x, y) #[return:
# (tensor(x1),tensor(y1));
# (tensor(x2),tensor(y2));
# ......
# print(torch_dataset, list(torch_dataset))
# 把 dataset 放入 DataLoader
loader = Data.DataLoader(
dataset=torch_dataset, # torch TensorDataset format
batch_size=BATCH_SIZE, # mini batch size
shuffle=False, # 要不要打乱数据 (打乱比较好)
num_workers=0, # 多线程来读数据
)
for epoch in range(3): # 训练所有!整套!数据 3 次
for step,(batch_x,batch_y) in enumerate(loader): # 每一步 loader 释放一小批数据用来学习
#return:
#(tensor(x1,x2,x3,x4,x5),tensor(y1,y2,y3,y4,y5))
#(tensor(x6,x7,x8,x9,x10),tensor(y6,y7,y8,y9,y10))
# 假设这里就是你训练的地方...
# 打出来一些数据
print('Epoch: ', epoch, '| Step:', step, '| batch x: ', batch_x.numpy(), '| batch y: ', batch_y.numpy())
上图为源码中dataloader中所有的可选参数。除了第一个dataset参数外,其他均为可选参数。
值得注意的是,Dataloader的参数之间存在互斥的情况,主要针对自己定义的采样器:
sampler:
SequentialSampler就是一个按照顺序进行采样的采样器,接收一个数据集做参数(实际上任何可迭代对象都可),按照顺序对其进行采样:
from torch.utils.data import SequentialSampler
pseudo_dataset = list(range(10, 20))
for data in SequentialSampler(pseudo_dataset):
print(data, end=" ")
0 1 2 3 4 5 6 7 8 9
RandomSampler 即一个随机采样器,返回随机采样的值,第一个参数依然是一个数据集(或可迭代对象)。还有一组参数如下:
from torch.utils.data import RandomSampler
pseudo_dataset = list(range(10, 20))
randomSampler1 = RandomSampler(pseudo_dataset)
randomSampler2 = RandomSampler(pseudo_dataset, replacement=True, num_samples=20)
print("for random sampler #1: ")
for data in randomSampler1:
print(data, end=" ")
print("\n\nfor random sampler #2: ")
for data in randomSampler2:
print(data, end=" ")
for random sampler #1:
4 5 2 9 3 0 6 8 7 1
for random sampler #2:
4 9 0 6 9 3 1 6 1 8 5 0 2 7 2 8 6 4 0 6
WeightedRandomSampler和RandomSampler的参数一致,但是不在传入一个dataset,第一个参数变成了weights,只接收一个一定长度的list作为 weights 参数,表示采样的权重,采样时会根据权重随机从 list(range(len(weights))) 中采样,即WeightedRandomSampler并不需要传入样本集,而是只在一个根据weights长度创建的数组中采样,所以采样的结果可能需要进一步处理才能使用。weights的所有元素之和不需要为1。
from torch.utils.data import WeightedRandomSampler
weights = [1,1,10,10]
weightedRandomSampler = WeightedRandomSampler(weights, replacement=True, num_samples=20)
for data in weightedRandomSampler:
print(data, end=" ")
2 2 2 3 2 2 3 2 3 3 1 3 2 2 1 3 3 2 3 3
详细使用可参考: WeightedRandomSampler使用案例
其他Sampler在每次迭代都只返回一个索引,而BatchSampler的作用是对上述这类返回一个索引的采样器进行包装,按照设定的batch size返回一组具体数据,因其他的参数和上述的有些不同:
from torch.utils.data import BatchSampler
pseudo_dataset = list(range(10, 20))
batchSampler1 = BatchSampler(pseudo_dataset, batch_size=3, drop_last=False)
batchSampler2 = BatchSampler(pseudo_dataset, batch_size=3, drop_last=True)
print("for batch sampler #1: ")
for data in batchSampler1:
print(data, end=" ")
print("\n\nfor batch sampler #2: ")
for data in batchSampler2:
print(data, end=" ")
for batch sampler #1:
[10, 11, 12] [13, 14, 15] [16, 17, 18] [19]
for batch sampler #2:
[10, 11, 12] [13, 14, 15] [16, 17, 18]
SubsetRandomSampler 可以设置子集的随机采样,多用于将数据集分成多个集合,比如训练集和验证集的时候使用:
from torch.utils.data import SubsetRandomSampler
pseudo_dataset = list(range(10, 20))
subRandomSampler1 = SubsetRandomSampler(pseudo_dataset[:7])
subRandomSampler2 = SubsetRandomSampler(pseudo_dataset[7:])
print("for subset random sampler #1: ")
for data in subRandomSampler1:
print(data, end=" ")
print("\n\nfor subset random sampler #2: ")
for data in subRandomSampler2:
print(data, end=" ")
for subset random sampler #1:
14 15 11 16 13 10 12
for subset random sampler #2:
17 19 18
参考:https://blog.csdn.net/qq_38962621/article/details/111146427
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。