当前位置:   article > 正文

tensorflow中的batch normalization_tensorflow normalization参数

tensorflow normalization参数

 

1.原理

公式如下:

y=γ(x-μ)/σ+β

其中x是输入,y是输出,μ是均值,σ是方差,γ和β是缩放(scale)、偏移(offset)系数。

一般来讲,这些参数都是基于channel来做的,比如输入x是一个16*32*32*128(NWHC格式)的feature map,那么上述参数都是128维的向量。其中γ和β是可有可无的,有的话,就是一个可以学习的参数(参与前向后向),没有的话,就简化成y=(x-μ)/σ。而μ和σ,在训练的时候,使用的是batch内的统计值,测试/预测的时候,采用的是训练时计算出的滑动平均值。

 

2.tensorflow中使用

tensorflow中batch normalization的实现主要有下面三个:

tf.nn.batch_normalization

tf.layers.batch_normalization

tf.contrib.layers.batch_norm

封装程度逐个递进,建议使用tf.layers.batch_normalization或tf.contrib.layers.batch_norm,因为在tensorflow官网的解释比较详细。

2.1 tf.nn.batch_normalization

  1. tf.nn.batch_normalization(
  2. x,
  3. mean,
  4. variance,
  5. offset,
  6. scale,
  7. variance_epsilon,
  8. name=None
  9. )
  1. Args:
  2. x: Input Tensor of arbitrary dimensionality.#任意维度的tensor
  3. mean: A mean Tensor.# 均值tensor
  4. variance: A variance Tensor. #方差tensor
  5. offset: An offset Tensor, often denoted β in equations, or None. If present, will be added to the normalized tensor.
  6. scale: A scale Tensor, often denoted γ in equations, or None. If present, the scale is applied to the normalized tensor.
  7. variance_epsilon: A small float number to avoid dividing by 0.
  8. name: A name for this operation (optional).
  9. Returns:
  10. the normalized, scaled, offset tensor.

example:

  1. import tensorflow as tf
  2. import numpy as np
  3. w1_initial = np.random.normal(size=(784,100)).astype(np.float32)
  4. w2_initial = np.random.normal(size=(100,100)).astype(np.float32)
  5. x = tf.placeholder(tf.float32, shape=[None, 784])
  6. w1 = tf.Variable(w1_initial)
  7. b1 = tf.Variable(tf.zeros([100]))
  8. z1 = tf.matmul(x,w1)+b1
  9. print("z1.shape:",z1.shape)
  10. l1 = tf.nn.sigmoid(z1)
  11. print("l1.shape:",l1.shape)
  12. batch_mean2, batch_var2 = tf.nn.moments(l1,[0])#axis = [0]按列求方差均值,axis = [1]按行求,axis = [0,1]求所有的均值和方差
  13. print("batch_mean2.shape:",batch_mean2.shape)
  14. print("batch_var2.shape:",batch_var2.shape)
  15. scale2 = tf.Variable(tf.ones([100]))
  16. beta2 = tf.Variable(tf.zeros([100]))
  17. epsilon = 1e-3
  18. BN2 = tf.nn.batch_normalization(l1,batch_mean2,batch_var2,beta2,scale2,epsilon)
  19. print("BN2.shape:",BN2.shape)
  1. z1.shape: (?, 100)
  2. l1.shape: (?, 100)
  3. batch_mean2.shape: (100,)
  4. batch_var2.shape: (100,)
  5. BN2.shape: (?, 100)

2.2 tf.layers.batch_normalization

  1. batch_normalization(inputs,
  2. axis=-1,
  3. momentum=0.99,
  4. epsilon=1e-3,
  5. center=True,
  6. scale=True,
  7. beta_initializer=init_ops.zeros_initializer(),
  8. gamma_initializer=init_ops.ones_initializer(),
  9. moving_mean_initializer=init_ops.zeros_initializer(),
  10. moving_variance_initializer=init_ops.ones_initializer(),
  11. beta_regularizer=None,
  12. gamma_regularizer=None,
  13. beta_constraint=None,
  14. gamma_constraint=None,
  15. training=False,
  16. trainable=True,
  17. name=None,
  18. reuse=None,
  19. renorm=False,
  20. renorm_clipping=None,
  21. renorm_momentum=0.99,
  22. fused=None,
  23. virtual_batch_size=None,
  24. adjustment=None):

注意:训练时,需要更新moving_mean和moving_variance。默认情况下,更新操作被放入tf.GraphKeys.UPDATE_OPS,因此需要将它们作为依赖项添加到train_op。此外,在获取update_ops集合之前,请务必添加batch_normalization操作。否则,update_ops将为空,并且训练/推断将无法正常工作。例如:

  1. x_norm = tf.layers.batch_normalization(x, training=training)
  2. # ...
  3. update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
  4. with tf.control_dependencies(update_ops):
  5. train_op = optimizer.minimize(loss)

训练的时候需要注意两点,(1)输入参数training=True,以保存一个batch中的平均值,方差等,在测试时,有些时候输入时单个样本,没有一个batch的样本,无法计算除一个batch平均值,方差等,所以在训练时保存平均值,方差等,在测试时将training=false,使用训练时保存的平均值,方差等数据,(2)计算loss时,要添加以上代码(即添加update_ops到最后的train_op中)。如果不加入这控制依赖会导致,测试准确率严重异常。

2.3 tf.contrib.layers.batch_norm

  1. tf.contrib.layers.batch_norm(
  2. inputs,
  3. decay=0.999,
  4. center=True,
  5. scale=False,
  6. epsilon=0.001,
  7. activation_fn=None,
  8. param_initializers=None,
  9. param_regularizers=None,
  10. updates_collections=tf.GraphKeys.UPDATE_OPS,
  11. is_training=True,
  12. reuse=None,
  13. variables_collections=None,
  14. outputs_collections=None,
  15. trainable=True,
  16. batch_weights=None,
  17. fused=None,
  18. data_format=DATA_FORMAT_NHWC,
  19. zero_debias_moving_mean=False,
  20. scope=None,
  21. renorm=False,
  22. renorm_clipping=None,
  23. renorm_decay=0.99,
  24. adjustment=None
  25. )

训练时,需要更新moving_mean和moving_variance。默认情况下,更新操作被放入tf.GraphKeys.UPDATE_OPS,所以需要添加它们作为依赖项train_op

这种方法与tf.layers.batch_normalization的使用方法差不多,两者最主要的差别在参数scale和centre的默认值上,这两个参数即是我们之前介绍原理时所说明的对input进行mean和variance的归一化之后采用的线性平移中的scale和offset,可以看到offset的默认值两者都是True,但是scale的默认值前者为True后者为False,也就是说明在tf.contrib.layers.batch_norm中,默认不对处理后的input进行线性缩放,只是加一个偏移。
 

 

https://blog.csdn.net/huitailangyz/article/details/85015611  #好文必看

https://www.cnblogs.com/hrlnw/p/7227447.html

https://blog.csdn.net/heiheiya/article/details/81000756

https://blog.csdn.net/candy_gl/article/details/79551149

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/很楠不爱3/article/detail/89338
推荐阅读
相关标签
  

闽ICP备14008679号