赞
踩
对小批量(mini-batch)3d数据组成的4d输入进行批标准化(Batch Normalization)操作
在每一个小批量(mini-batch)数据中,计算输入各个维度的均值和标准差。gamma与beta是可学习的大小为C的参数向量(C为输入大小)
在训练时,该层计算每次输入的均值与方差,并进行移动平均。移动平均默认的动量值为0.1。
在验证时,训练求得的均值/方差将用于标准化验证数据。
参数:
Shape: - 输入:(N, C)或者(N, C, L) - 输出:(N, C)或者(N,C,L)(输入输出相同)
- import torch
- import torch.nn as nn
- x=torch.Tensor([[1,2,3,4],
- [2,3,4,5],
- [4,5,6,7]])
- #affine参数为False说明没有gamma和beta
- m=nn.BatchNorm1d(4,momentum=0.1,affine=False)
- y1=m(x)
- y2=(x-x.mean(0))/(x.std(0)**2*2/3+1e-5)**(1/2)
- #y1和y2是一样的
- print(y1)
- print(y2)
-
- out:
- tensor([[-1.0690, -1.0690, -1.0690, -1.0690],
- [-0.2673, -0.2673, -0.2673, -0.2673],
- [ 1.3363, 1.3363, 1.3363, 1.3363]])
- tensor([[-1.0690, -1.0690, -1.0690, -1.0690],
- [-0.2673, -0.2673, -0.2673, -0.2673],
- [ 1.3363, 1.3363, 1.3363, 1.3363]])
-
- print(m)
-
- out:
- BatchNorm1d(4, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
-
- print(m.weight,m.bias)
-
- out:
- None None
-
- m=nn.BatchNorm1d(4,momentum=0.1,affine=True)
- print(m.weight,m.bias)
-
- #参数的数目和feature的数目是一致的
- out:
- Parameter containing:
- tensor([1., 1., 1., 1.], requires_grad=True)
- Parameter containing:
- tensor([0., 0., 0., 0.], requires_grad=True)
-
-
- m.reset_parameters()
-
-
- print(m.running_mean,m.running_var)
-
- out:
- tensor([0., 0., 0., 0.]) tensor([1., 1., 1., 1.])
-
- m(x)
- print(m.running_mean,m.running_var)
-
- out:
- tensor([0.2333, 0.3333, 0.4333, 0.5333]) tensor([1.1333, 1.1333, 1.1333, 1.1333])
-
- m(x)
- print(m.running_mean,m.running_var)
-
- out:
- tensor([0.4433, 0.6333, 0.8233, 1.0133]) tensor([1.2533, 1.2533, 1.2533, 1.2533])
-
- m(x)
- print(m.running_mean,m.running_var)
-
- out:
- tensor([0.6323, 0.9033, 1.1743, 1.4453]) tensor([1.3613, 1.3613, 1.3613, 1.3613])
-
-
- m.eval()
- m(x)
- print(m.running_mean,m.running_var)
- out:
- tensor([0.6323, 0.9033, 1.1743, 1.4453]) tensor([1.3613, 1.3613, 1.3613, 1.3613])
-
-
-
-
-
BatchNorm1d在N方向上也就是批方向上计算均值与方差,用BatchNorm1d对象和(x-x.mean(0))/(x.std(0)**2*2/3+1e-5)**(1/2)计算的结果是一样的,x.std(0)**2*2/3的意思是pytorch计算标准差时,底下除以的是N-1。
当affine设置为False时,没有gamma和beta参数,默认为True。
指数加权平均:v_1=m*v1,v_2=(1-m)*v_1+m*v2,v_3=(1-m)*v_2+m*v3。
momentum=0.1意思是采用指数加权平均,即当下一个batch来了时,会和上一个batch的running_mean、running_var做指数加权平均,running_mean、running_var保存当前均值和方差的估计值,即0.2333=0.1*2.333,0.4433=(1-0.1)*0.2333+0.1*2.3333,0.6323=(1-0.1)*0.4433+0.1*2.3333,调用m.reset_parameters()可以重置所有参数。
调用m.eval()或m.train(mode=False)时,BatchNorm层的running_state和权重便不再变化。
Shape: - 输入:(N, C,H, W) - 输出:(N, C, H, W)(输入输出相同)
- x=torch.Tensor(range(1,17))
- x=x.reshape(2,2,2,2)
- print(x)
- print(x.shape)
-
- out:
- tensor([[[[ 1., 2.],
- [ 3., 4.]],
-
- [[ 5., 6.],
- [ 7., 8.]]],
-
-
- [[[ 9., 10.],
- [11., 12.]],
-
- [[13., 14.],
- [15., 16.]]]])
- torch.Size([2, 2, 2, 2])
-
- m=nn.BatchNorm2d(2,affine=False)
- y1=m(x)
-
- print(y1)
-
- out:
- tensor([[[[-1.3242, -1.0835],
- [-0.8427, -0.6019]],
-
- [[-1.3242, -1.0835],
- [-0.8427, -0.6019]]],
-
-
- [[[ 0.6019, 0.8427],
- [ 1.0835, 1.3242]],
-
- [[ 0.6019, 0.8427],
- [ 1.0835, 1.3242]]]])
-
其中(1+2+3+4+9+10+11+12)/8=6.5,[1,2,3,4,9,10,11,12]的标准差为4.1533,因此(1-6.5)/4.1533=-1.3242,(2-6.5)/4.1533=-1.0835,(9-6.5)/4.1533=0.60193。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。