The documentation is not quite clear about this. I suppose the gradients one can obtain by <code>opt.compute_gradients(E, [v])</code> contain the <code>∂E/∂x = g(x)</code> for each element <code>x</code> of the tensor that <code>v</code> stores. Does <code>opt.apply_gradients(grads_and_vars)</code> essentially execute <code>x ← -η·g(x)</code>, where <code>η</code> is the learning rate? That would imply that if I want to add a positive additive change <code>p</code> to the variable, I would need to need to change <code>g(x) ← g(x) - (1/η)p</code>, e.g. like this: <pre class="prettyprint lang-py prettyprint-override"><code>opt = tf.train.GradientDescentOptimizer(learning_rate=l) grads_and_vars = opt.compute_gradients(loss, var_list) for l, gv in enumerate(grads_and_vars): grads_and_vars[l] = (gv[0] - (1/l) * p, gv[1]) train_op = opt.apply_gradients(grads_and_vars) </code></pre> Is there a better way to do this?

The update rule that the <code>apply_gradients</code> method actually applies depends on the specific optimizer. Take a look at the implementation of <code>apply_gradients</code> in the <code>tf.train.Optimizer</code> class here. It relies on the derived classes implementing the update rule in the methods <code>_apply_dense</code> and <code>_apply_spares</code>. The update rule you are referring to is implemented by the <code>GradientDescentOptimizer</code>. Regarding your desired positive additive update: If what you are calling <code>opt</code> is an instantiation of <code>GradientDescentOptimizer</code>, then you could indeed achieve what you want to do by <pre class="prettyprint"><code>grads_and_vars = opt.compute_gradients(E, [v]) eta = opt._learning_rate my_grads_and_vars = [(g-(1/eta)*p, v) for g, v in grads_and_vars] opt.apply_gradients(my_grads_and_vars) </code></pre> The more elegant way to do this is probably to write a new optimizer (inheriting from <code>tf.train.Optimizer</code>) that implements your desired update rule directly.

What does opt.apply_gradients() do in TensorFlow?

Tags:

tensorflow

The documentation is not quite clear about this. I suppose the gradients one can obtain by opt.compute_gradients(E, [v]) contain the ∂E/∂x = g(x) for each element x of the tensor that v stores. Does opt.apply_gradients(grads_and_vars) essentially execute x ← -η·g(x), where η is the learning rate? That would imply that if I want to add a positive additive change p to the variable, I would need to need to change g(x) ← g(x) - (1/η)p, e.g. like this:

opt = tf.train.GradientDescentOptimizer(learning_rate=l) grads_and_vars = opt.compute_gradients(loss, var_list)  for l, gv in enumerate(grads_and_vars):     grads_and_vars[l] = (gv[0] - (1/l) * p, gv[1])  train_op = opt.apply_gradients(grads_and_vars)

Is there a better way to do this?

774

asked Jun 20 '16 11:06

Lenar Hoyt

2 Answers

The update rule that the apply_gradients method actually applies depends on the specific optimizer. Take a look at the implementation of apply_gradients in the tf.train.Optimizer class here. It relies on the derived classes implementing the update rule in the methods _apply_dense and _apply_spares. The update rule you are referring to is implemented by the GradientDescentOptimizer.

Regarding your desired positive additive update: If what you are calling opt is an instantiation of GradientDescentOptimizer, then you could indeed achieve what you want to do by

grads_and_vars = opt.compute_gradients(E, [v]) eta = opt._learning_rate my_grads_and_vars = [(g-(1/eta)*p, v) for g, v in grads_and_vars] opt.apply_gradients(my_grads_and_vars)

The more elegant way to do this is probably to write a new optimizer (inheriting from tf.train.Optimizer) that implements your desired update rule directly.

126

answered Sep 20 '22 19:09

lballes

You can also use eager execution API.

import tensorflow as tf tf.enable_eager_execution() tfe = tf.contrib.eager optimizer = tf.train.GradientDescentOptimizer(learning_rate) grad = tfe.implicit_gradients(loss) optimizer.apply_gradients(grad(model_fn, val_list))

I will make an instance for it as follow:

import tensorflow as tf tf.enable_eager_exeuction() tfe = tf.contrib.eager  W = tfe.Variable(np.random.randn()) b = tfe.Variable(np.random.randn())  def linear_regression(inputs):     return inputs * W + b;  def MSE(model_fn, inputs, labels):     return tf.reduce_sum(tf.pow(model_fn(inputs) - labels, 2)) / (2 * n_samples)  optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.001) grad = tfe.implicit_gradients(MSE) optimizer.apply_gradients(grad(linear_regression, train_X, train_Y)) # train_X and train_Y are your input data and label

answered Sep 21 '22 19:09

Zonyue Li

Related questions
                            
                                Tensorflow object detection config files documentation
                            
                                TensorFlow - numpy-like tensor indexing
                            
                                Why do we use tf.name_scope()
                            
                                Keras confusion about number of layers
                            
                                Linear vs nonlinear neural network?
                            
                                How to disable printing reports after each epoch in Keras?
                            
                                ValueError: Tensor must be from the same graph as Tensor with Bidirectinal RNN in Tensorflow
                            
                                Run Tensorflow unit tests
                            
                                Tensorflow: Multiple loss functions vs Multiple training ops
                            
                                Tensorflow VarLenFeature vs FixedLenFeature
                            
                                Why does keras model predict slower after compile?
                            
                                How to set weights in Keras with a numpy array?
                            
                                How to fix MatMul Op has type float64 that does not match type float32 TypeError?
                            
                                How to do slice assignment in Tensorflow
                            
                                TensorFlow: Max of a tensor along an axis
                            
                                How to load only specific weights on Keras
                            
                                How to turn off dropout for testing in Tensorflow?
                            
                                od_graph_def = tf.GraphDef() AttributeError: module 'tensorflow' has no attribute 'GraphDef'
                            
                                Keras: change learning rate
                            
                                Tensorflow Slim: TypeError: Expected int32, got list containing Tensors of type '_Message' instead

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With