I am confused about the difference between apply_gradients
and minimize
of optimizer in tensorflow. For example,
optimizer = tf.train.AdamOptimizer(1e-3) grads_and_vars = optimizer.compute_gradients(cnn.loss) train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
and
optimizer = tf.train.AdamOptimizer(1e-3) train_op = optimizer.minimize(cnn.loss, global_step=global_step)
Are they the same indeed?
If I want to decay the learning rate, can I use the following codes?
global_step = tf.Variable(0, name="global_step", trainable=False) starter_learning_rate = 1e-3 learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 100, FLAGS.decay_rate, staircase=True) # Passing global_step to minimize() will increment it at each step. learning_step = ( optimizer = tf.train.AdamOptimizer(learning_rate) grads_and_vars = optimizer.compute_gradients(cnn.loss) train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step) )
Thanks for your help!
Calling minimize() takes care of both computing the gradients and applying them to the variables. If you want to process the gradients before applying them you can instead use the optimizer in three steps: Compute the gradients with tf. GradientTape .
You can easily know from the link : https://www.tensorflow.org/get_started/get_started (tf.train API part) that they actually do the same job. The difference it that: if you use the separated functions( tf.gradients, tf.apply_gradients), you can apply other mechanism between them, such as gradient clipping.
here it says minimize
uses tf.GradienTape
and then apply_gradients
:
Minimize loss by updating var_list.
This method simply computes gradient using tf.GradientTape and calls apply_gradients(). If you want to process the gradient before applying then call tf.GradientTape and apply_gradients() explicitly instead of using this function.
So minimize
actually uses apply_gradients
just like:
def minimize(self, loss, var_list, grad_loss=None, name=None, tape=None): grads_and_vars = self._compute_gradients(loss, var_list=var_list, grad_loss=grad_loss, tape=tape) return self.apply_gradients(grads_and_vars, name=name)
In your example, you use compute_gradients
and apply_gradients
, this is indeed valid but nowadays, compute_gradients
was made private and is therefore not good practice to use it. For this reason the function is not longer on the documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With