赞
踩
批标准化(Batch Normalization,简称BN)是一种用于深度神经网络的技术,它的主要目的是解决深度学习模型训练过程中的内部协变量偏移问题。简单来说,当我们在训练深度神经网络时,每一层的输入分布都可能会随着前一层参数的更新而发生变化,这种变化会导致训练过程变得不稳定。BN通过对每一层的输入进行标准化,使其均值为0,方差为1,从而使得网络在每一层都能接收到相对稳定的数据分布。
对2d或3d数据进行批标准化(Batch Normlization)操作:
class torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True):
参数:
1.num_features:特征的维度 ( N , L ) − > L ; ( N , C , L ) − > C (N,L) -> L ;(N,C,L) -> C (N,L)−>L;(N,C,L)−>C
2.eps:在分母上添加一个定值,不能趋近于0
3.momentum:动态均值和动态方差所使用的动量,这里的momentum是对均值和方差进行的滑动平均。即 μ 1 = ( 1 − m o m e n t u m ) ∗ μ l a s t + m o m e n t u m ∗ μ μ_1 = (1 - momentum)* μ_{last} + momentum * μ μ1=(1−momentum)∗μlast+momentum∗μ,这里μ1为输出值,μ_last为上一次的计算值,μ为真实计算的值
4.affine:布尔变量,是否为该层添加可学习的仿设变换,仿射变换的系数即为下式的gamma和beta
原理:
计算各个维度的均值和标准差: y = x − mean [ x ] Var [ x ] + ϵ ∗ g a m m a + beta y=\frac{x-\operatorname{mean}[x]}{\sqrt{\operatorname{Var}[x]}+\epsilon} * g a m m a+\text { beta } y=Var[x] +ϵx−mean[x]∗gamma+ beta
m = nn.BatchNorm1d(5, affine=False) m1 = nn.BatchNorm1d(5, affine=True) input = autograd.Variable(torch.randn(5, 5)) output = m(input) output1 = m1(input) print(input, '\n',output,'\n',output1) tensor([[-0.6046, -0.8939, 1.3246, 0.2621, 1.0777], [ 0.9088, -0.6219, 0.9589, 0.7307, 0.5221], [ 1.7435, 0.6662, -0.5827, 0.3325, -0.8179], [-0.2250, 0.9930, 0.0504, -0.4509, 1.6605], [-0.5742, 1.6543, 0.6083, 0.5746, -0.3208]]) tensor([[-0.9212, -1.2920, 1.2648, -0.0680, 0.7249], [ 0.7107, -1.0117, 0.7224, 1.0842, 0.1085], [ 1.6108, 0.3161, -1.5642, 0.1049, -1.3780], [-0.5119, 0.6530, -0.6252, -1.8215, 1.3713], [-0.8885, 1.3345, 0.2022, 0.7005, -0.8266]]) tensor([[-0.9212, -1.2920, 1.2648, -0.0680, 0.7249], [ 0.7107, -1.0117, 0.7224, 1.0842, 0.1085], [ 1.6108, 0.3161, -1.5642, 0.1049, -1.3780], [-0.5119, 0.6530, -0.6252, -1.8215, 1.3713], [-0.8885, 1.3345, 0.2022, 0.7005, -0.8266]], grad_fn=<NativeBatchNormBackward>)
对小批量(mini-batch)3d数据组成的4d输入进行批标准化(Batch Normalization)操作
class torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True):
1.num_features: 来自期望输入的特征数,C from an expected input of size (N,C,H,W)
2.eps: 为保证数值稳定性(分母不能趋近或取0),给分母加上的值。默认为1e-5.
3.momentum: 动态均值和动态方差所使用的动量。默认为0.1.
4.affine: 一个布尔值,当设为true,给该层添加可学习的仿射变换参数。
原理:
计算各个维度的均值和标准差: y = x − mean [ x ] Var [ x ] + ϵ ∗ g a m m a + beta y=\frac{x-\operatorname{mean}[x]}{\sqrt{\operatorname{Var}[x]}+\epsilon} * g a m m a+\text { beta } y=Var[x] +ϵx−mean[x]∗gamma+ beta
m = nn.BatchNorm2d(2, affine=False) m1 = nn.BatchNorm2d(2, affine=True) input = autograd.Variable(torch.randn(1,2,5, 5)) output = m(input) output1 = m1(input) print(input, '\n',output,'\n',output1) tensor([[[[-0.2606, -0.8874, 0.8364, 0.0184, 0.8040], [ 1.0593, -0.6811, 1.3497, -0.6840, -2.0859], [-0.5399, 1.3321, -0.6281, -0.9044, 1.7491], [ 0.7559, 0.5607, -0.0447, -0.3868, 1.2404], [ 1.2078, -0.9642, 0.3980, 0.2087, -1.3940]], [[ 0.0493, 0.7372, 1.1964, 0.3862, 0.9900], [ 0.3544, 0.1767, -1.5780, 0.1642, -2.1586], [-0.4891, -0.7272, 1.6860, -1.6091, 0.9730], [-2.4161, -2.2096, 0.4617, -0.2965, -0.5663], [-0.0222, -0.7628, 0.6404, -1.4428, 0.5750]]]]) tensor([[[[-0.3522, -0.9959, 0.7743, -0.0657, 0.7410], [ 1.0032, -0.7840, 1.3015, -0.7870, -2.2266], [-0.6390, 1.2833, -0.7296, -1.0134, 1.7116], [ 0.6917, 0.4912, -0.1305, -0.4818, 1.1892], [ 1.1557, -1.0748, 0.3242, 0.1298, -1.5161]], [[ 0.2560, 0.8743, 1.2870, 0.5588, 1.1015], [ 0.5302, 0.3705, -1.2066, 0.3593, -1.7285], [-0.2280, -0.4420, 1.7271, -1.2346, 1.0862], [-1.9599, -1.7743, 0.6266, -0.0549, -0.2974], [ 0.1917, -0.4739, 0.7873, -1.0852, 0.7285]]]]) tensor([[[[-0.3522, -0.9959, 0.7743, -0.0657, 0.7410], [ 1.0032, -0.7840, 1.3015, -0.7870, -2.2266], [-0.6390, 1.2833, -0.7296, -1.0134, 1.7116], [ 0.6917, 0.4912, -0.1305, -0.4818, 1.1892], [ 1.1557, -1.0748, 0.3242, 0.1298, -1.5161]], [[ 0.2560, 0.8743, 1.2870, 0.5588, 1.1015], [ 0.5302, 0.3705, -1.2066, 0.3593, -1.7285], [-0.2280, -0.4420, 1.7271, -1.2346, 1.0862], [-1.9599, -1.7743, 0.6266, -0.0549, -0.2974], [ 0.1917, -0.4739, 0.7873, -1.0852, 0.7285]]]], grad_fn=<NativeBatchNormBackward>)
用的地方通常在一个全连接或者卷积层与激活函数中间,即 (全连接/卷积)—- BatchNorm —- 激活函数。但也有人说把 BatchNorm 放在激活函数后面效果更好,可以都试一下。
BN的作用:
(31条消息) BatchNorm2d原理、作用及其pytorch中BatchNorm2d函数的参数讲解_LS_learner的博客-CSDN博客_batchnorm2d
(31条消息) pytorch中批量归一化BatchNorm1d和BatchNorm2d函数_小白827的博客-CSDN博客_batchnorm1d 2d
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。