赞
踩
在机器学习中,进行模型训练之前,需对数据做归一化处理,使其分布一致。在深度神经网络训练过程中,通常一次训练一个batch,而非全体数据。每个batch具有不同的分布,这样产生了internal covarivate shift问题——在训练过程中,数据分布会发生变化,对下一层网络的学习带来困难。Batch Normalization强行将数据拉回到均值为0,方差为1的正太分布上,一方面使得数据分布一致,另一方面避免梯度消失。可以加快网络训练的收敛速度。
参考:
“Batch normalization: Accelerating deep network training by reducing internal covariate shift.”
PyTorch踩坑指南(1)nn.BatchNorm2d()函数_白水煮蝎子的博客-CSDN博客
<1>torch.nn.BatchNorm1d
CLASS torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)
Parameters:
num_features (int) – number of features or channels C of the input.样本的特征维数,一般输入数据的形式是矩阵。
eps (float) – a value added to the denominator for numerical stability. Default: 1e-5.为避免归一化时分母为0。
momentum (float) – the value used for the running_mean and running_var computation. Can be set to None
for cumulative moving average (i.e. simple average). Default: 0.1 用来计算running_mean和running_var的一个量。
affine (bool) – a boolean value that when set to True
, this module has learnable affine parameters. Default: True.
用于设置是否具有可学习的仿射参数和,
的初始值为1,的初始值为0。
track_running_stats (bool) – a boolean value that when set to True
, this module tracks the running mean and variance, and when set to False
, this module does not track such statistics, and initializes statistics buffers running_mean
and running_var
as None
. When these buffers are None
, this module always uses batch statistics. in both training and eval modes. Default: True.
用于设置是否更新统计值均值与方差,如果设置为True,则running_mean
的初始值为0,running_var
的初始值为1,如果设置为False,则初始化running_mean
和running_var
为None,如果为None则在训练和测试的过程中均使用测试数据batch 的统计值进行数据归一化。
Shape:
Input: (N, C), where N is the batch size, Cis the number of features or channels.输入数据的形式是矩阵。
Output: (N, C) (same shape as input).
需要注意的是:仿射参数和是在反向传播学习得到,
running_mean
和running_var
是在正向传播中统计得到。
参考:BatchNorm1d — PyTorch 2.0 documentation
nn.BatchNorm1d_harry_tea的博客-CSDN博客
1.模型均值和方差的更新机制、数据归一化机制
需要注意的是 track_running_stats 的设置只在创建BatchNorm时有效,不在创建BatchNorm时设置不起效果,如下面案例:
- import torch
- m = torch.nn.BatchNorm1d(8,momentum=0.1, affine=True, track_running_stats = True) #track_running_stats只在初始化时设置有效
- m.track_running_stats = False
- print('running_mean:',m.running_mean) #初始值
- print('running_var:',m.running_var )
- print('track_running_stats:',m.track_running_stats )
- m2 = torch.nn.BatchNorm1d(8,momentum=0.1, affine=True, track_running_stats = False)
- print('running_mean:',m2.running_mean) #初始值
- print('running_var:',m2.running_var )
- running_mean: tensor([0., 0., 0., 0., 0., 0., 0., 0.])
- running_var: tensor([1., 1., 1., 1., 1., 1., 1., 1.])
- track_running_stats: False
- running_mean: None
- running_var: None
running_mean 和running_var 模型参数是否需要更新,需要结合参数trainning和track_running_states来看,归一化的机制也因这两种参数的不同而不同。
(1)trainning = True,track_running_states = True:模型处于训练阶段,每作一次归一化,模型都需要更新参数running_mean和running_var,即跟踪每个batch数据的均值和方差。
参数更新方法:
其中,为模型的均值或方差,为当前观测数据batch的均值或方差,为更新后的均值或方差,为更新参数。
观测数据batch归一化方法:
其中,为观测数据batch的均值,为观测数据batch的方差,为归一化之后的某个通道的数据,即归一化是用当前batch的均值和方差做归一化。
注意方差的无偏估计:
方差的有偏估计:,
需要注意的是:running_mean和running_var更新用的无偏的方差,数据归一化用的是有偏的方差。当数值较大时,有偏估计和无偏估计基本一致。
- import torch
- m = torch.nn.BatchNorm1d(8,momentum=0.1, affine=True, track_running_stats = True) #track_running_stats只在初始化时设置有效
- print('running_mean:',m.running_mean) #初始值
- print('running_var:',m.running_var )
- print('weight:',m.weight)
- print('bias:',m.bias)
-
- input = torch.randn(5, 8)
- print('input:',input)
- print('input[...,0]:',input[...,0])#第一列数据
-
- obser_mean = torch.Tensor([input[...,i].mean() for i in range(8)])# 输入数据的均值
- obser_var_unbiased = torch.Tensor([input[...,i].var() for i in range(8)])# 方差 无偏估计,相当于torch.var(input, dim=0, unbiased=True)
- obser_var_biased = torch.Tensor([input[...,i].var(unbiased=False) for i in range(8)])# 方差 有偏估计,相当于torch.var(input, dim=0, unbiased=False)
- print('obser_mean:',obser_mean)
- print('obser_var_unbiased:',obser_var_unbiased)
- print('obser_var_biased:',obser_var_biased)
-
- obser_running_mean = (1-m.momentum)*m.running_mean + m.momentum*obser_mean
- obser_running_var = (1-m.momentum)*m.running_var + m.momentum*obser_var_unbiased
- output = m(input)
- output_obser = (input[...,0] - obser_mean[0])/(pow(obser_var_biased[0] + m.eps,0.5))#归一化
- print('obser_running_mean:',obser_running_mean)
- print('obser_running_var:',obser_running_var)
- print('running_mean:',m.running_mean)
- print('running_var:',m.running_var )
- print('output[...,0]:',output[...,0])#归一化数据
- print('output_obser:',output_obser)
- running_mean: tensor([0., 0., 0., 0., 0., 0., 0., 0.])
- running_var: tensor([1., 1., 1., 1., 1., 1., 1., 1.])
- weight: Parameter containing:
- tensor([1., 1., 1., 1., 1., 1., 1., 1.], requires_grad=True)
- bias: Parameter containing:
- tensor([0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)
- input: tensor([[ 0.4901, -0.1794, 1.1301, -0.1901, -0.7794, -0.2863, -0.9673, 1.5712],
- [ 0.7150, -0.6555, 0.1724, 1.8487, 0.3064, -0.0863, 1.3970, 0.3117],
- [-1.4870, -1.0768, -0.8371, 1.7132, 0.9250, 0.6004, -0.2488, 0.8714],
- [-0.7459, 0.3344, -1.1203, 1.7061, 0.6755, -0.2490, 1.4969, 0.6247],
- [-0.2600, 2.0536, 0.5194, -1.4121, -1.3856, -0.2249, 0.0729, -0.4737]])
- input[...,0]: tensor([ 0.4901, 0.7150, -1.4870, -0.7459, -0.2600])
- obser_mean: tensor([-0.2576, 0.0953, -0.0271, 0.7332, -0.0516, -0.0492, 0.3501, 0.5810])
- obser_var_unbiased: tensor([0.8138, 1.4763, 0.8822, 2.1516, 0.9799, 0.1376, 1.1456, 0.5629])
- obser_var_biased: tensor([0.6510, 1.1810, 0.7058, 1.7213, 0.7839, 0.1101, 0.9165, 0.4503])
- obser_running_mean: tensor([-0.0258, 0.0095, -0.0027, 0.0733, -0.0052, -0.0049, 0.0350, 0.0581])
- obser_running_var: tensor([0.9814, 1.0476, 0.9882, 1.1152, 0.9980, 0.9138, 1.0146, 0.9563])
- running_mean: tensor([-0.0258, 0.0095, -0.0027, 0.0733, -0.0052, -0.0049, 0.0350, 0.0581])
- running_var: tensor([0.9814, 1.0476, 0.9882, 1.1152, 0.9980, 0.9138, 1.0146, 0.9563])
- output[...,0]: tensor([ 0.9267, 1.2054, -1.5237, -0.6053, -0.0031],
- grad_fn=<SelectBackward0>)
- output_obser: tensor([ 0.9267, 1.2054, -1.5237, -0.6053, -0.0031])
(2)trainning = False,track_running_states = True:模型处于测试阶段,利用模型存储的均值和方差作归一化,但不更新模型的均值和方差。
- # 测试阶段
- m.eval()
- print(m.training)
- print(m.track_running_stats)
- input = torch.randn(5, 8)
-
- obser_mean = torch.mean(input, dim=0)# 输入数据的均值
- obser_var_biased = torch.var(input, dim=0, unbiased=False) # 方差 有偏估计
- print('obser_mean:',obser_mean)
- print('obser_var_biased:',obser_var_biased)
- print('running_mean:',m.running_mean)
- print('running_var:',m.running_var )
-
- output = m(input)
- output_obser = (input[...,0] - obser_mean[0])/(pow(obser_var_biased[0] + m.eps,0.5))#归一化
- output2_obser = (input[...,0] - m.running_mean[0])/(pow(m.running_var[0] + m.eps,0.5))
- print('output[...,0]:',output[...,0])
- print('output_obser:',output_obser)
- print('output2_obser:',output2_obser)
- False
- True
- obser_mean: tensor([-0.1944, -0.0978, -0.2910, 0.1413, -0.3192, 0.2840, 0.2374, 0.0797])
- obser_var_biased: tensor([2.0099, 0.1910, 0.8365, 0.2942, 1.6348, 1.7859, 1.5220, 0.6940])
- running_mean: tensor([ 0.0830, 0.0147, 0.0465, 0.0591, 0.0311, 0.0579, -0.0096, 0.0033])
- running_var: tensor([1.0621, 0.9541, 0.9964, 1.1851, 0.9764, 1.0526, 1.0279, 0.9809])
- output[...,0]: tensor([ 0.4747, -2.2079, -1.2251, 1.7868, -0.1744],
- grad_fn=<SelectBackward0>)
- output_obser: tensor([ 0.5407, -1.4093, -0.6949, 1.4946, 0.0689])
- output2_obser: tensor([ 0.4747, -2.2079, -1.2251, 1.7868, -0.1744])
(3)trainning = True或False,track_running_states = False:模型无论处于训练或测试阶段,都是利用当前batch的均值和方差做归一化,且不更新模型的均值和方差。
- import torch
- m = torch.nn.BatchNorm1d(8,momentum=0.1, affine=True, track_running_stats = False) #track_running_stats只在初始化时设置有效
- print('running_mean:',m.running_mean) #初始值
- print('running_var:',m.running_var )
-
- input = torch.randn(5, 8)
- obser_mean = torch.Tensor([input[...,i].mean() for i in range(8)])# 输入数据的均值
- obser_var_biased = torch.Tensor([input[...,i].var(unbiased=False) for i in range(8)])# 方差 有偏估计
- print('obser_mean:',obser_mean)
- print('obser_var_biased:',obser_var_biased)
- output = m(input)
- output_obser = (input[...,0] - obser_mean[0])/(pow(obser_var_biased[0] + m.eps,0.5))#归一化
- print('running_mean:',m.running_mean)
- print('running_var:',m.running_var )
- print('output[...,0]:',output[...,0])#归一化数据
- print('output_obser:',output_obser)
-
-
- # 测试阶段
- m.eval()
- print(m.training)
- print(m.track_running_stats)
- input = torch.randn(5, 8)
-
- obser_mean = torch.mean(input, dim=0)# 输入数据的均值
- obser_var_biased = torch.var(input, dim=0, unbiased=False) # 方差 有偏估计
- print('obser_mean:',obser_mean)
- print('obser_var_biased:',obser_var_biased)
- print('running_mean:',m.running_mean)
- print('running_var:',m.running_var )
-
- output = m(input)
- output_obser = (input[...,0] - obser_mean[0])/(pow(obser_var_biased[0] + m.eps,0.5))#归一化
- print('output[...,0]:',output[...,0])
- print('output_obser:',output_obser)
- running_mean: None
- running_var: None
- obser_mean: tensor([-0.4277, 0.2008, -0.3871, 0.4741, 0.5016, 0.6817, -0.2613, 0.0763])
- obser_var_biased: tensor([0.3961, 0.7895, 1.1211, 0.2614, 0.2954, 0.4563, 1.8461, 0.7862])
- running_mean: None
- running_var: None
- output[...,0]: tensor([ 0.6016, 1.0995, -1.0294, 0.6994, -1.3712],
- grad_fn=<SelectBackward0>)
- output_obser: tensor([ 0.6016, 1.0995, -1.0294, 0.6994, -1.3712])
- False
- False
- obser_mean: tensor([-0.7911, -0.0979, 0.5710, -0.8198, 0.3552, -0.0772, 0.7881, 0.7573])
- obser_var_biased: tensor([2.0702, 1.2274, 1.2483, 0.5527, 0.3471, 0.2689, 1.0752, 0.7770])
- running_mean: None
- running_var: None
- output[...,0]: tensor([ 0.1504, 0.6443, -0.4143, 1.2792, -1.6596],
- grad_fn=<SelectBackward0>)
- output_obser: tensor([ 0.1504, 0.6443, -0.4143, 1.2792, -1.6596])
<2>torch.nn.BatchNorm2d
CLASS torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)
Parameters:
num_features (int) – Cfrom an expected input of size (N, C, H, W)
Shape:
Input: (N, C, H, W)
Output: (N, C, H, W) (same shape as input)
BatchNorm2d和BatchNorm1d不同的地方在于输入数据的不同。在计算观测数据batch的均值和方差的时候是对batch的同一通道上的所有元素进行统计计算,假设输入数据为,则有:
每个通道的均值:。
每个通道的方差:。
对通道的每个元素作归一化:。
BatchNorm2d的模型均值和方差的更新机制,以及数据归一化机制跟BatchNorm1d一样。
- import torch
- m = torch.nn.BatchNorm2d(3, eps=0, momentum=0.5, affine=True, track_running_stats=True)
- print('running_mean:',m.running_mean) #初始值
- print('running_var:',m.running_var )
- print('weight:',m.weight)
- print('bias:',m.bias)
-
- input = torch.randn(1, 3, 5, 5)
- print('input[0][0]:',input[0][0])#第一个batch,第一个通道
-
- obser_mean = torch.Tensor([input[0][i].mean() for i in range(3)])# 输入数据的均值
- obser_var_unbiased = torch.Tensor([input[0][i].var() for i in range(3)])# 方差 无偏估计,相当于torch.var(input[0][0], unbiased=True)
- obser_var_biased = torch.Tensor([input[0][i].var(unbiased=False) for i in range(3)])# 方差 有偏估计,相当于torch.var(input[0][0], unbiased=False)
- print('obser_mean:',obser_mean)
- print('obser_var_unbiased:',obser_var_unbiased)
- print('obser_var_biased:',obser_var_biased)
- obser_running_mean = (1-m.momentum)*m.running_mean + m.momentum*obser_mean
- obser_running_var = (1-m.momentum)*m.running_var + m.momentum*obser_var_unbiased
- output = m(input)
- print('obser_running_mean:',obser_running_mean)
- print('obser_running_var:',obser_running_var)
- print('running_mean:',m.running_mean)
- print('running_var:',m.running_var )
- output_obser = (input[0][0] - obser_mean[0])/(pow(obser_var_biased[0] + m.eps,0.5))#归一化
- print('output[0][0]:',output[0][0])#归一化数据
- print('output_obser:',output_obser)
- running_mean: tensor([0., 0., 0.])
- running_var: tensor([1., 1., 1.])
- weight: Parameter containing:
- tensor([1., 1., 1.], requires_grad=True)
- bias: Parameter containing:
- tensor([0., 0., 0.], requires_grad=True)
- input[0][0]: tensor([[-0.3550, -1.1596, -0.4947, -0.8188, -0.1722],
- [-1.3371, 0.9375, 0.5564, 2.3561, -0.5711],
- [ 0.3932, 2.6657, 0.3440, -0.9300, 0.1791],
- [-1.0307, 0.2115, 0.4953, 1.8088, 0.0496],
- [-1.0584, 0.4566, -0.1415, 1.2106, 0.4498]])
- obser_mean: tensor([ 0.1618, 0.2137, -0.0836])
- obser_var_unbiased: tensor([1.1056, 1.1707, 0.9300])
- obser_var_biased: tensor([1.0613, 1.1239, 0.8928])
- obser_running_mean: tensor([ 0.0809, 0.1068, -0.0418])
- obser_running_var: tensor([1.0528, 1.0853, 0.9650])
- running_mean: tensor([ 0.0809, 0.1068, -0.0418])
- running_var: tensor([1.0528, 1.0853, 0.9650])
- output[0][0]: tensor([[-0.5016, -1.2827, -0.6372, -0.9518, -0.3242],
- [-1.4549, 0.7529, 0.3830, 2.1300, -0.7114],
- [ 0.2246, 2.4305, 0.1768, -1.0598, 0.0168],
- [-1.1575, 0.0482, 0.3237, 1.5987, -0.1089],
- [-1.1844, 0.2861, -0.2944, 1.0180, 0.2795]],
- grad_fn=<SelectBackward0>)
- output_obser: tensor([[-0.5016, -1.2827, -0.6372, -0.9518, -0.3242],
- [-1.4549, 0.7529, 0.3830, 2.1300, -0.7114],
- [ 0.2246, 2.4305, 0.1768, -1.0598, 0.0168],
- [-1.1575, 0.0482, 0.3237, 1.5987, -0.1089],
- [-1.1844, 0.2861, -0.2944, 1.0180, 0.2795]])
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。