Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does opt.apply_gradients() do in TensorFlow?

Tags:

tensorflow

The documentation is not quite clear about this. I suppose the gradients one can obtain by opt.compute_gradients(E, [v]) contain the ∂E/∂x = g(x) for each element x of the tensor that v stores. Does opt.apply_gradients(grads_and_vars) essentially execute x ← -η·g(x), where η is the learning rate? That would imply that if I want to add a positive additive change p to the variable, I would need to need to change g(x) ← g(x) - (1/η)p, e.g. like this:

opt = tf.train.GradientDescentOptimizer(learning_rate=l) grads_and_vars = opt.compute_gradients(loss, var_list)  for l, gv in enumerate(grads_and_vars):     grads_and_vars[l] = (gv[0] - (1/l) * p, gv[1])  train_op = opt.apply_gradients(grads_and_vars) 

Is there a better way to do this?

like image 774
Lenar Hoyt Avatar asked Jun 20 '16 11:06

Lenar Hoyt


People also ask

What does TF train Adamoptimizer do?

It returns a list of (gradient, variable) pairs where "gradient" is the gradient for "variable". Note that "gradient" can be a Tensor , an IndexedSlices , or None if there is no gradient for the given variable.

Which of the following is minimized when you run TF train GradientDescentOptimizer () Minimize ()?

train. GradientDescentOptimizer(0.01). minimize(error) where the training step is defined. It aims to minimise the value of the error Variable, which is defined earlier as the square of the differences (a common error function).

What is Optimizer in Tensorflow?

Optimizers are the extended class, which include added information to train a specific model. The optimizer class is initialized with given parameters but it is important to remember that no Tensor is needed. The optimizers are used for improving speed and performance for training a specific model.


2 Answers

The update rule that the apply_gradients method actually applies depends on the specific optimizer. Take a look at the implementation of apply_gradients in the tf.train.Optimizer class here. It relies on the derived classes implementing the update rule in the methods _apply_dense and _apply_spares. The update rule you are referring to is implemented by the GradientDescentOptimizer.

Regarding your desired positive additive update: If what you are calling opt is an instantiation of GradientDescentOptimizer, then you could indeed achieve what you want to do by

grads_and_vars = opt.compute_gradients(E, [v]) eta = opt._learning_rate my_grads_and_vars = [(g-(1/eta)*p, v) for g, v in grads_and_vars] opt.apply_gradients(my_grads_and_vars) 

The more elegant way to do this is probably to write a new optimizer (inheriting from tf.train.Optimizer) that implements your desired update rule directly.

like image 126
lballes Avatar answered Sep 20 '22 19:09

lballes


You can also use eager execution API.

import tensorflow as tf tf.enable_eager_execution() tfe = tf.contrib.eager optimizer = tf.train.GradientDescentOptimizer(learning_rate) grad = tfe.implicit_gradients(loss) optimizer.apply_gradients(grad(model_fn, val_list)) 

I will make an instance for it as follow:

import tensorflow as tf tf.enable_eager_exeuction() tfe = tf.contrib.eager  W = tfe.Variable(np.random.randn()) b = tfe.Variable(np.random.randn())  def linear_regression(inputs):     return inputs * W + b;  def MSE(model_fn, inputs, labels):     return tf.reduce_sum(tf.pow(model_fn(inputs) - labels, 2)) / (2 * n_samples)  optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.001) grad = tfe.implicit_gradients(MSE) optimizer.apply_gradients(grad(linear_regression, train_X, train_Y)) # train_X and train_Y are your input data and label 
like image 38
Zonyue Li Avatar answered Sep 21 '22 19:09

Zonyue Li