当前位置:   article > 正文

tensorflow之Optimizers(tensorflow的优化器)_tf1.15自定义优化器

tf1.15自定义优化器

一.概述

1.默认情况下,优化器训练目标函数所依赖的所有可训练变量。如果你不想训练某一个变量,你可以将关键词trainable设置为False。举例如下:

  1. global_step = tf.Variable(0, trainable=False, dtype=tf.int32)
  2. learning_rate = 0.01 * 0.99 ** tf.cast(global_step, tf.float32)
  3. increment_step = global_step.assign_add(1)
  4. optimizer = tf.GradientDescentOptimizer(learning_rate) # learning rate can be a tensor

2.tf.Variable类的完全定义

tf.Variable(initial_value=None, trainable=True, collections=None,validate_shape=True, caching_device=None, name=None, variable_def=None, dtype=None,expected_shape=None, import_scope=None)

3.可以要求优化器采用特定变量的渐变,也可以修改优化程序计算的渐变

  1. # create an optimizer.
  2. optimizer = GradientDescentOptimizer(learning_rate=0.1)
  3. # compute the gradients for a list of variables.
  4. grads_and_vars = optimizer.compute_gradients(loss, <list of variables>)
  5. # grads_and_vars is a list of tuples (gradient, variable). Do whatever you
  6. # need to the 'gradient' part, for example, subtract each of them by 1.
  7. subtracted_grads_and_vars = [(gv[0] - 1.0, gv[1]) for gv in grads_and_vars]
  8. # ask the optimizer to apply the subtracted gradients.
  9. optimizer.apply_gradients(subtracted_grads_and_vars)

4.更底层的功能

tf.gradients(ys, xs, grad_ys=None, name='gradients',colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None)

格式:

tf.gradients(ys,xs, grad_ys=None, name='gradients',colocate_gradients_with_ops=False,gate_gradients=False,aggregation_method=None,stop_gradients=None)
解释说明:

对求导函数而言,其主要功能即求导公式:∂y/∂x。在tensorflow中,y和x都是tensor

tf.gradients()接受求导值ysxs不仅可以是tensor,还可以是list,形如[tensor1, tensor2, …, tensorn]。当ysxs都是list时,它们的求导关系为:

  • tf.gradients()实现ysxs求导
  • 求导返回值是一个list,list的长度等于len(xs)
  • 假设返回值是[grad1, grad2, grad3],ys=[y1, y2],xs=[x1, x2, x3]。则,真实的计算过程为: 

                                                               

在仅训练模型的某些部分时,这尤其有用。

5.更多的Optimizer

  • tf.train.GradientDescentOptimizer
  • tf.train.AdadeltaOptimizer
  • tf.train.AdagradOptimizer
  • tf.train.AdagradDAOptimizer
  • tf.train.MomentumOptimizer
  • tf.train.AdamOptimizer
  • tf.train.FtrlOptimizer
  • tf.train.ProximalGradientDescentOptimizer
  • tf.train.ProximalAdagradOptimizer
  • tf.train.RMSPropOptimizer

使用结论:RMSprop is an extension of Adagrad that deals with its radically diminishing learning rates. It is identical to Adadelta, except that Adadelta uses the RMS of parameter updates in the numerator update rule. Adam, finally, adds bias-correction and momentum to RMSprop. Insofar,RMSprop, Adadelta, and Adam are very similar algorithms that do well in similar circumstances.Kingma et al. [15] show that its bias-correction helps Adam slightly outperform RMSprop towards the end of optimization as gradients become sparser. Insofar, Adam might be the best overall choice.

推荐:使用AdamOptimizer

二.代码实例与说明

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/552312
推荐阅读
相关标签
  

闽ICP备14008679号