I am confused about the difference between <code>apply_gradients</code> and <code>minimize</code> of optimizer in tensorflow. For example, <pre class="prettyprint lang-py prettyprint-override"><code>optimizer = tf.train.AdamOptimizer(1e-3) grads_and_vars = optimizer.compute_gradients(cnn.loss) train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step) </code></pre> and <pre class="prettyprint lang-py prettyprint-override"><code>optimizer = tf.train.AdamOptimizer(1e-3) train_op = optimizer.minimize(cnn.loss, global_step=global_step) </code></pre> Are they the same indeed? If I want to decay the learning rate, can I use the following codes? <pre class="prettyprint lang-py prettyprint-override"><code>global_step = tf.Variable(0, name="global_step", trainable=False) starter_learning_rate = 1e-3 learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 100, FLAGS.decay_rate, staircase=True) # Passing global_step to minimize() will increment it at each step. learning_step = ( optimizer = tf.train.AdamOptimizer(learning_rate) grads_and_vars = optimizer.compute_gradients(cnn.loss) train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step) ) </code></pre> Thanks for your help!

here it says <code>minimize</code> uses <code>tf.GradienTape</code> and then <code>apply_gradients</code>: <blockquote> Minimize loss by updating var_list. This method simply computes gradient using tf.GradientTape and calls apply_gradients(). If you want to process the gradient before applying then call tf.GradientTape and apply_gradients() explicitly instead of using this function. </blockquote> So <code>minimize</code> actually uses <code>apply_gradients</code> just like: <pre class="prettyprint"><code>def minimize(self, loss, var_list, grad_loss=None, name=None, tape=None): grads_and_vars = self._compute_gradients(loss, var_list=var_list, grad_loss=grad_loss, tape=tape) return self.apply_gradients(grads_and_vars, name=name) </code></pre> <hr> In your example, you use <code>compute_gradients</code> and <code>apply_gradients</code>, this is indeed valid but nowadays, <code>compute_gradients</code> was made private and is therefore not good practice to use it. For this reason the function is not longer on the documentation.

Difference between `apply_gradients` and `minimize` of optimizer in tensorflow

Tags:

tensorflow

I am confused about the difference between apply_gradients and minimize of optimizer in tensorflow. For example,

optimizer = tf.train.AdamOptimizer(1e-3) grads_and_vars = optimizer.compute_gradients(cnn.loss) train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)

and

optimizer = tf.train.AdamOptimizer(1e-3) train_op = optimizer.minimize(cnn.loss, global_step=global_step)

Are they the same indeed?

If I want to decay the learning rate, can I use the following codes?

global_step = tf.Variable(0, name="global_step", trainable=False) starter_learning_rate = 1e-3 learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,                                        100, FLAGS.decay_rate, staircase=True) # Passing global_step to minimize() will increment it at each step. learning_step = (     optimizer = tf.train.AdamOptimizer(learning_rate)     grads_and_vars = optimizer.compute_gradients(cnn.loss)     train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step) )

Thanks for your help!

818

asked Aug 03 '17 02:08

Panfeng Li

2 Answers

You can easily know from the link : https://www.tensorflow.org/get_started/get_started (tf.train API part) that they actually do the same job. The difference it that: if you use the separated functions( tf.gradients, tf.apply_gradients), you can apply other mechanism between them, such as gradient clipping.

166

answered Oct 11 '22 15:10

Yanghoon

here it says minimize uses tf.GradienTape and then apply_gradients:

Minimize loss by updating var_list.

This method simply computes gradient using tf.GradientTape and calls apply_gradients(). If you want to process the gradient before applying then call tf.GradientTape and apply_gradients() explicitly instead of using this function.

So minimize actually uses apply_gradients just like:

def minimize(self, loss, var_list, grad_loss=None, name=None, tape=None):     grads_and_vars = self._compute_gradients(loss, var_list=var_list, grad_loss=grad_loss, tape=tape)     return self.apply_gradients(grads_and_vars, name=name)

In your example, you use compute_gradients and apply_gradients, this is indeed valid but nowadays, compute_gradients was made private and is therefore not good practice to use it. For this reason the function is not longer on the documentation.

answered Oct 11 '22 14:10

Agustin Barrachina

Related questions
                            
                                TensorFlow: Max of a tensor along an axis
                            
                                How to load only specific weights on Keras
                            
                                How to turn off dropout for testing in Tensorflow?
                            
                                od_graph_def = tf.GraphDef() AttributeError: module 'tensorflow' has no attribute 'GraphDef'
                            
                                Keras: change learning rate
                            
                                Tensorflow Slim: TypeError: Expected int32, got list containing Tensors of type '_Message' instead
                            
                                What does opt.apply_gradients() do in TensorFlow?
                            
                                tensorflow deep neural network for regression always predict same results in one batch
                            
                                Why is this TensorFlow implementation vastly less successful than Matlab's NN?
                            
                                Tensorflow Dictionary lookup with String tensor
                            
                                How do you read Tensorboard files programmatically?
                            
                                keras vs. tensorflow.python.keras - which one to use?
                            
                                What's the difference between tensorflow dynamic_rnn and rnn?
                            
                                No module named 'absl' error when I import tensorflow
                            
                                Tensor with unspecified dimension in tensorflow
                            
                                Numpy is installed but still getting error
                            
                                Tensorboard not found as magic function in jupyter
                            
                                Non-deterministic behavior of TensorFlow while_loop()
                            
                                pip installation error "No such file or directory: setup.py"
                            
                                what does the question mark in tensorflow shape mean?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With