赞
踩
1.原理
公式如下:
y=γ(x-μ)/σ+β
其中x是输入,y是输出,μ是均值,σ是方差,γ和β是缩放(scale)、偏移(offset)系数。
一般来讲,这些参数都是基于channel来做的,比如输入x是一个16*32*32*128(NWHC格式)的feature map,那么上述参数都是128维的向量。其中γ和β是可有可无的,有的话,就是一个可以学习的参数(参与前向后向),没有的话,就简化成y=(x-μ)/σ。而μ和σ,在训练的时候,使用的是batch内的统计值,测试/预测的时候,采用的是训练时计算出的滑动平均值。
2.tensorflow中使用
tensorflow中batch normalization的实现主要有下面三个:
tf.nn.batch_normalization
tf.layers.batch_normalization
tf.contrib.layers.batch_norm
封装程度逐个递进,建议使用tf.layers.batch_normalization或tf.contrib.layers.batch_norm,因为在tensorflow官网的解释比较详细。
2.1 tf.nn.batch_normalization
- tf.nn.batch_normalization(
- x,
- mean,
- variance,
- offset,
- scale,
- variance_epsilon,
- name=None
- )
- Args:
- x: Input Tensor of arbitrary dimensionality.#任意维度的tensor
- mean: A mean Tensor.# 均值tensor
- variance: A variance Tensor. #方差tensor
- offset: An offset Tensor, often denoted β in equations, or None. If present, will be added to the normalized tensor.
- scale: A scale Tensor, often denoted γ in equations, or None. If present, the scale is applied to the normalized tensor.
- variance_epsilon: A small float number to avoid dividing by 0.
- name: A name for this operation (optional).
-
- Returns:
- the normalized, scaled, offset tensor.
example:
- import tensorflow as tf
- import numpy as np
- w1_initial = np.random.normal(size=(784,100)).astype(np.float32)
- w2_initial = np.random.normal(size=(100,100)).astype(np.float32)
- x = tf.placeholder(tf.float32, shape=[None, 784])
- w1 = tf.Variable(w1_initial)
- b1 = tf.Variable(tf.zeros([100]))
- z1 = tf.matmul(x,w1)+b1
- print("z1.shape:",z1.shape)
- l1 = tf.nn.sigmoid(z1)
- print("l1.shape:",l1.shape)
- batch_mean2, batch_var2 = tf.nn.moments(l1,[0])#axis = [0]按列求方差均值,axis = [1]按行求,axis = [0,1]求所有的均值和方差
- print("batch_mean2.shape:",batch_mean2.shape)
- print("batch_var2.shape:",batch_var2.shape)
- scale2 = tf.Variable(tf.ones([100]))
- beta2 = tf.Variable(tf.zeros([100]))
- epsilon = 1e-3
- BN2 = tf.nn.batch_normalization(l1,batch_mean2,batch_var2,beta2,scale2,epsilon)
- print("BN2.shape:",BN2.shape)
- z1.shape: (?, 100)
- l1.shape: (?, 100)
- batch_mean2.shape: (100,)
- batch_var2.shape: (100,)
- BN2.shape: (?, 100)
2.2 tf.layers.batch_normalization
- batch_normalization(inputs,
- axis=-1,
- momentum=0.99,
- epsilon=1e-3,
- center=True,
- scale=True,
- beta_initializer=init_ops.zeros_initializer(),
- gamma_initializer=init_ops.ones_initializer(),
- moving_mean_initializer=init_ops.zeros_initializer(),
- moving_variance_initializer=init_ops.ones_initializer(),
- beta_regularizer=None,
- gamma_regularizer=None,
- beta_constraint=None,
- gamma_constraint=None,
- training=False,
- trainable=True,
- name=None,
- reuse=None,
- renorm=False,
- renorm_clipping=None,
- renorm_momentum=0.99,
- fused=None,
- virtual_batch_size=None,
- adjustment=None):
注意:训练时,需要更新moving_mean和moving_variance。默认情况下,更新操作被放入tf.GraphKeys.UPDATE_OPS,因此需要将它们作为依赖项添加到train_op。此外,在获取update_ops集合之前,请务必添加batch_normalization操作。否则,update_ops将为空,并且训练/推断将无法正常工作。例如:
-
- x_norm = tf.layers.batch_normalization(x, training=training)
-
- # ...
-
- update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
-
- with tf.control_dependencies(update_ops):
-
- train_op = optimizer.minimize(loss)
训练的时候需要注意两点,(1)输入参数training=True,
以保存一个batch中的平均值,方差等,在测试时,有些时候输入时单个样本,没有一个batch的样本,无法计算除一个batch平均值,方差等,所以在训练时保存平均值,方差等,在测试时将training=false,使用训练时保存的平均值,方差等数据,(2)计算loss时,要添加以上代码(即添加
update_ops到最后的train_op中)。如果不加入这控制依赖会导致,测试准确率严重异常。
2.3 tf.contrib.layers.batch_norm
- tf.contrib.layers.batch_norm(
- inputs,
- decay=0.999,
- center=True,
- scale=False,
- epsilon=0.001,
- activation_fn=None,
- param_initializers=None,
- param_regularizers=None,
- updates_collections=tf.GraphKeys.UPDATE_OPS,
- is_training=True,
- reuse=None,
- variables_collections=None,
- outputs_collections=None,
- trainable=True,
- batch_weights=None,
- fused=None,
- data_format=DATA_FORMAT_NHWC,
- zero_debias_moving_mean=False,
- scope=None,
- renorm=False,
- renorm_clipping=None,
- renorm_decay=0.99,
- adjustment=None
- )
-
训练时,需要更新moving_mean和moving_variance。默认情况下,更新操作被放入tf.GraphKeys.UPDATE_OPS,所以需要添加它们作为依赖项train_op
。
这种方法与tf.layers.batch_normalization的使用方法差不多,两者最主要的差别在参数scale和centre的默认值上,这两个参数即是我们之前介绍原理时所说明的对input进行mean和variance的归一化之后采用的线性平移中的scale和offset,可以看到offset的默认值两者都是True,但是scale的默认值前者为True后者为False,也就是说明在tf.contrib.layers.batch_norm中,默认不对处理后的input进行线性缩放,只是加一个偏移。
https://blog.csdn.net/huitailangyz/article/details/85015611 #好文必看
https://www.cnblogs.com/hrlnw/p/7227447.html
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。