Pytorch中的BatchNorm_batchnorm pytorch

作者：凡人多烦事01 | 2024-05-22 04:31:55

踩

batchnorm pytorch

class torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True)

对小批量(mini-batch)3d数据组成的4d输入进行批标准化(Batch Normalization)操作

在每一个小批量（mini-batch）数据中，计算输入各个维度的均值和标准差。gamma与beta是可学习的大小为C的参数向量（C为输入大小）

在训练时，该层计算每次输入的均值与方差，并进行移动平均。移动平均默认的动量值为0.1。

在验证时，训练求得的均值/方差将用于标准化验证数据。

参数：

num_features： 来自期望输入的特征数，该期望输入的大小为'batch_size x num_features x height x width'
eps： 为保证数值稳定性（分母不能趋近或取0）,给分母加上的值。默认为1e-5。
momentum： 动态均值和动态方差所使用的动量。默认为0.1。
affine： 一个布尔值，当设为true，给该层添加可学习的仿射变换参数。

Shape： - 输入：（N, C）或者(N, C, L) - 输出：（N, C）或者（N，C，L）（输入输出相同）


import torch
import torch.nn as nn
x=torch.Tensor([[1,2,3,4],
               [2,3,4,5],
               [4,5,6,7]])
#affine参数为False说明没有gamma和beta
m=nn.BatchNorm1d(4,momentum=0.1,affine=False)
y1=m(x)
y2=(x-x.mean(0))/(x.std(0)**2*2/3+1e-5)**(1/2)
#y1和y2是一样的
print(y1)
print(y2)
 
out:
tensor([[-1.0690, -1.0690, -1.0690, -1.0690],
        [-0.2673, -0.2673, -0.2673, -0.2673],
        [ 1.3363,  1.3363,  1.3363,  1.3363]])
tensor([[-1.0690, -1.0690, -1.0690, -1.0690],
        [-0.2673, -0.2673, -0.2673, -0.2673],
        [ 1.3363,  1.3363,  1.3363,  1.3363]])
 
print(m)
 
out:
BatchNorm1d(4, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
 
print(m.weight,m.bias)
 
out:
None None
 
m=nn.BatchNorm1d(4,momentum=0.1,affine=True)
print(m.weight,m.bias)
 
#参数的数目和feature的数目是一致的
out:
Parameter containing:
tensor([1., 1., 1., 1.], requires_grad=True)
Parameter containing:
tensor([0., 0., 0., 0.], requires_grad=True)
 
 
m.reset_parameters()
 
 
print(m.running_mean,m.running_var)
 
out:
tensor([0., 0., 0., 0.]) tensor([1., 1., 1., 1.])
 
m(x)
print(m.running_mean,m.running_var)
 
out:
tensor([0.2333, 0.3333, 0.4333, 0.5333]) tensor([1.1333, 1.1333, 1.1333, 1.1333])
 
m(x)
print(m.running_mean,m.running_var)
 
out:
tensor([0.4433, 0.6333, 0.8233, 1.0133]) tensor([1.2533, 1.2533, 1.2533, 1.2533])
 
m(x)
print(m.running_mean,m.running_var)
 
out:
tensor([0.6323, 0.9033, 1.1743, 1.4453]) tensor([1.3613, 1.3613, 1.3613, 1.3613])
 
 
m.eval()
m(x)
print(m.running_mean,m.running_var)
out:
tensor([0.6323, 0.9033, 1.1743, 1.4453]) tensor([1.3613, 1.3613, 1.3613, 1.3613])

BatchNorm1d在N方向上也就是批方向上计算均值与方差，用BatchNorm1d对象和(x-x.mean(0))/(x.std(0)**2*2/3+1e-5)**(1/2)计算的结果是一样的，x.std(0)**2*2/3的意思是pytorch计算标准差时，底下除以的是N-1。

当affine设置为False时，没有gamma和beta参数，默认为True。

指数加权平均：v_1=m*v1，v_2=(1-m)*v_1+m*v2，v_3=(1-m)*v_2+m*v3。

momentum=0.1意思是采用指数加权平均，即当下一个batch来了时，会和上一个batch的running_mean、running_var做指数加权平均，running_mean、running_var保存当前均值和方差的估计值，即0.2333=0.1*2.333，0.4433=(1-0.1)*0.2333+0.1*2.3333，0.6323=(1-0.1)*0.4433+0.1*2.3333，调用m.reset_parameters()可以重置所有参数。

调用m.eval()或m.train(mode=False)时，BatchNorm层的running_state和权重便不再变化。

class torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True)

Shape： - 输入：（N, C，H, W) - 输出：（N, C, H, W）（输入输出相同）


x=torch.Tensor(range(1,17))
x=x.reshape(2,2,2,2)
print(x)
print(x.shape)
 
out:
tensor([[[[ 1.,  2.],
          [ 3.,  4.]],
 
         [[ 5.,  6.],
          [ 7.,  8.]]],
 
 
        [[[ 9., 10.],
          [11., 12.]],
 
         [[13., 14.],
          [15., 16.]]]])
torch.Size([2, 2, 2, 2])
 
m=nn.BatchNorm2d(2,affine=False)
y1=m(x)
 
print(y1)
 
out:
tensor([[[[-1.3242, -1.0835],
          [-0.8427, -0.6019]],
 
         [[-1.3242, -1.0835],
          [-0.8427, -0.6019]]],
 
 
        [[[ 0.6019,  0.8427],
          [ 1.0835,  1.3242]],
 
         [[ 0.6019,  0.8427],
          [ 1.0835,  1.3242]]]])

其中（1+2+3+4+9+10+11+12）/8=6.5，[1,2,3,4,9,10,11,12]的标准差为4.1533，因此(1-6.5)/4.1533=-1.3242，(2-6.5)/4.1533=-1.0835，(9-6.5)/4.1533=0.60193。

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/凡人多烦事01/article/detail/606292