赞
踩
1.默认情况下,优化器训练目标函数所依赖的所有可训练变量。如果你不想训练某一个变量,你可以将关键词trainable设置为False。举例如下:
- global_step = tf.Variable(0, trainable=False, dtype=tf.int32)
- learning_rate = 0.01 * 0.99 ** tf.cast(global_step, tf.float32)
-
- increment_step = global_step.assign_add(1)
- optimizer = tf.GradientDescentOptimizer(learning_rate) # learning rate can be a tensor
2.tf.Variable类的完全定义
tf.Variable(initial_value=None, trainable=True, collections=None,validate_shape=True, caching_device=None, name=None, variable_def=None, dtype=None,expected_shape=None, import_scope=None)
3.可以要求优化器采用特定变量的渐变,也可以修改优化程序计算的渐变
- # create an optimizer.
- optimizer = GradientDescentOptimizer(learning_rate=0.1)
-
- # compute the gradients for a list of variables.
- grads_and_vars = optimizer.compute_gradients(loss, <list of variables>)
-
- # grads_and_vars is a list of tuples (gradient, variable). Do whatever you
- # need to the 'gradient' part, for example, subtract each of them by 1.
- subtracted_grads_and_vars = [(gv[0] - 1.0, gv[1]) for gv in grads_and_vars]
-
- # ask the optimizer to apply the subtracted gradients.
- optimizer.apply_gradients(subtracted_grads_and_vars)
4.更底层的功能
tf.gradients(ys, xs, grad_ys=None, name='gradients',colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None)
格式:
tf.gradients(ys,xs, grad_ys=None, name='gradients',colocate_gradients_with_ops=False,gate_gradients=False,aggregation_method=None,stop_gradients=None)
解释说明:
对求导函数而言,其主要功能即求导公式:∂y/∂x。在tensorflow中,y和x都是tensor。
tf.gradients()
接受求导值ys
和xs
不仅可以是tensor,还可以是list,形如[tensor1, tensor2, …, tensorn]。当ys
和xs
都是list时,它们的求导关系为:
tf.gradients()
实现ys
对xs
求导len(xs)
ys
=[y1, y2],xs
=[x1, x2, x3]。则,真实的计算过程为:
在仅训练模型的某些部分时,这尤其有用。
5.更多的Optimizer
使用结论:RMSprop is an extension of Adagrad that deals with its radically diminishing learning rates. It is identical to Adadelta, except that Adadelta uses the RMS of parameter updates in the numerator update rule. Adam, finally, adds bias-correction and momentum to RMSprop. Insofar,RMSprop, Adadelta, and Adam are very similar algorithms that do well in similar circumstances.Kingma et al. [15] show that its bias-correction helps Adam slightly outperform RMSprop towards the end of optimization as gradients become sparser. Insofar, Adam might be the best overall choice.
推荐:使用AdamOptimizer
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。