Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between `apply_gradients` and `minimize` of optimizer in tensorflow

Tags:

tensorflow

I am confused about the difference between apply_gradients and minimize of optimizer in tensorflow. For example,

optimizer = tf.train.AdamOptimizer(1e-3) grads_and_vars = optimizer.compute_gradients(cnn.loss) train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step) 

and

optimizer = tf.train.AdamOptimizer(1e-3) train_op = optimizer.minimize(cnn.loss, global_step=global_step) 

Are they the same indeed?

If I want to decay the learning rate, can I use the following codes?

global_step = tf.Variable(0, name="global_step", trainable=False) starter_learning_rate = 1e-3 learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,                                        100, FLAGS.decay_rate, staircase=True) # Passing global_step to minimize() will increment it at each step. learning_step = (     optimizer = tf.train.AdamOptimizer(learning_rate)     grads_and_vars = optimizer.compute_gradients(cnn.loss)     train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step) ) 

Thanks for your help!

like image 818
Panfeng Li Avatar asked Aug 03 '17 02:08

Panfeng Li


People also ask

What does the minimize function of Optimizer do?

Calling minimize() takes care of both computing the gradients and applying them to the variables. If you want to process the gradients before applying them you can instead use the optimizer in three steps: Compute the gradients with tf. GradientTape .


2 Answers

You can easily know from the link : https://www.tensorflow.org/get_started/get_started (tf.train API part) that they actually do the same job. The difference it that: if you use the separated functions( tf.gradients, tf.apply_gradients), you can apply other mechanism between them, such as gradient clipping.

like image 166
Yanghoon Avatar answered Oct 11 '22 15:10

Yanghoon


here it says minimize uses tf.GradienTape and then apply_gradients:

Minimize loss by updating var_list.

This method simply computes gradient using tf.GradientTape and calls apply_gradients(). If you want to process the gradient before applying then call tf.GradientTape and apply_gradients() explicitly instead of using this function.

So minimize actually uses apply_gradients just like:

def minimize(self, loss, var_list, grad_loss=None, name=None, tape=None):     grads_and_vars = self._compute_gradients(loss, var_list=var_list, grad_loss=grad_loss, tape=tape)     return self.apply_gradients(grads_and_vars, name=name) 

In your example, you use compute_gradients and apply_gradients, this is indeed valid but nowadays, compute_gradients was made private and is therefore not good practice to use it. For this reason the function is not longer on the documentation.

like image 29
Agustin Barrachina Avatar answered Oct 11 '22 14:10

Agustin Barrachina