What's the difference between GradientTape, implicit_gradients, gradients_function and implicit_value_and_gradients?

1 Answers

There are 4 ways to automatically compute gradients when eager execution is enabled (actually, they also work in graph mode):

tf.GradientTape context records computations so that you can call tfe.gradient() to get the gradients of any tensor computed while recording with regards to any trainable variable.
tfe.gradients_function() takes a function (say f()) and returns a gradient function (say fg()) that can compute the gradients of the outputs of f() with regards to the parameters of f() (or a subset of them).
tfe.implicit_gradients() is very similar but fg() computes the gradients of the outputs of f() with regards to all trainable variables these outputs depend on.
tfe.implicit_value_and_gradients() is almost identical but fg() also returns the output of the function f().

Usually, in Machine Learning, you will want to compute the gradients of the loss with regards to the model parameters (ie. variables), and you will generally also be interested in the value of the loss itself. For this use case, the simplest and most efficient options are tf.GradientTape and tfe.implicit_value_and_gradients() (the other two options do not give you the value of the loss itself, so if you need it, it will require extra computations). I personally prefer tfe.implicit_value_and_gradients() when writing production code, and tf.GradientTape when experimenting in a Jupyter notebook.

Edit: In TF 2.0, it seems that only tf.GradientTape remains. Maybe the other functions will be added back, but I wouldn't count on it.

Detailed example

Let's create a small function to highlight the differences:

import tensorflow as tf
import tensorflow.contrib.eager as tfe
tf.enable_eager_execution()

w1 = tfe.Variable(2.0)
w2 = tfe.Variable(3.0)

def weighted_sum(x1, x2):
    return w1 * x1 + w2 * x2

s = weighted_sum(5., 7.)
print(s.numpy()) # 31

Using `tf.GradientTape`

Within a GradientTape context, all operations are recorded, then you can compute the gradients of any tensor computed within the context, with regards to any trainable variable. For example, this code computes s within the GradientTape context, and then computes the gradient of s with regards to w1. Since s = w1 * x1 + w2 * x2, the gradient of s with regards to w1 is x1:

with tf.GradientTape() as tape:
    s = weighted_sum(5., 7.)

[w1_grad] = tape.gradient(s, [w1])
print(w1_grad.numpy()) # 5.0 = gradient of s with regards to w1 = x1

Using `tfe.gradients_function()`

This function returns another function that can compute the gradients of a function's returned value with regards to its parameters. For example, we can use it to define a function that will compute the gradients of s with regards to x1 and x2:

grad_fn = tfe.gradients_function(weighted_sum)
x1_grad, x2_grad = grad_fn(5., 7.)
print(x1_grad.numpy()) # 2.0 = gradient of s with regards to x1 = w1

In the context of optimization, it would make more sense compute gradients with regards to variables that we can tweak. For this, we can change the weighted_sum() function to take w1 and w2 as parameters as well, and tell tfe.gradients_function() to only consider the parameters named "w1" and "w2":

def weighted_sum_with_weights(w1, x1, w2, x2):
    return w1 * x1 + w2 * x2

grad_fn = tfe.gradients_function(weighted_sum_with_weights, params=["w1", "w2"])
[w1_grad, w2_grad] = grad_fn(w1, 5., w2, 7.)
print(w2_grad.numpy()) # 7.0 = gradient of s with regards to w2 = x2

Using `tfe.implicit_gradients()`

This function returns another function that can compute the gradients of a function's returned value with regards to all trainable variables it depends on. Going back to the first version of weighted_sum(), we can use it to compute the gradients of s with regards to w1 and w2 without having to explicitly pass these variables. Note that the gradient function returns a list of gradient/variable pairs:

grad_fn = tfe.implicit_gradients(weighted_sum)
[(w1_grad, w1_var), (w2_grad, w2_var)] = grad_fn(5., 7.)
print(w1_grad.numpy()) # 5.0 = gradient of s with regards to w1 = x1

assert w1_var is w1
assert w2_var is w2

This function does seem like the simplest and most useful option, since generally we are interested in computing the gradients of the loss with regards to the model parameters (ie. variables). Note: try making w1 untrainable (w1 = tfe.Variable(2., trainable=False)) and redefine weighted_sum(), and you will see that grad_fn only returns the gradient of s with regards to w2.

Using `tfe.implicit_value_and_gradients()`

This function is almost identical to implicit_gradients() except the function it creates also returns the result of the function being differentiated (in this case weighted_sum()):

grad_fn = tfe.implicit_value_and_gradients(weighted_sum)
s, [(w1_grad, w1_var), (w2_grad, w2_var)] = grad_fn(5., 7.)
print(s.numpy()) # 31.0 = s = w1 * x1 + w2 * x2

When you need both the output of a function and its gradients, this function can give you a nice performance boost, since you get the output of the function for free when computing the gradients using autodiff.

130

answered Oct 22 '22 02:10

MiniQuark

Related questions
                            
                                Tensorflow==2.0.0a0 - AttributeError: module 'tensorflow' has no attribute 'global_variables_initializer'
                            
                                Predicting the next word using the LSTM ptb model tensorflow example
                            
                                How to use evaluation_loop with train_loop in tf-slim
                            
                                TensorFlow: How to ensure Tensors are in the same graph
                            
                                Tensorflow 1.0 Windows + 64-bit Anaconda 4.3.0 error
                            
                                Oversampling functionality in Tensorflow dataset API
                            
                                Randomly sample from multiple tf.data.Datasets in Tensorflow
                            
                                How to change colors of function plots in Tensorboard?
                            
                                TensorFlow: Feeding data with queue vs with direct feeding with feed_dict
                            
                                Tensorflow - Using tf.summary with 1.2 Estimator API
                            
                                What is the expected input range for working with Keras VGG models?
                            
                                Returning probabilities in a classification prediction in Keras?
                            
                                How to implement neural network pruning?
                            
                                Cannot install TensorFlow 1.x
                            
                                How to build a multiple input graph with tensor flow?
                            
                                What is the equivalent of tf.nn.rnn in new versions of TensorFlow?
                            
                                Keras error: expected dense_input_1 to have 3 dimensions
                            
                                Massive overfit during resnet50 transfer learning
                            
                                How to write to TensorBoard in TensorFlow 2
                            
                                What do the functions tf.squeeze and tf.nn.rnn do?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the difference between GradientTape, implicit_gradients, gradients_function and implicit_value_and_gradients?

Tags:

tensorflow

Milad

People also ask

1 Answers

Detailed example

Using `tf.GradientTape`

Using `tfe.gradients_function()`

Using `tfe.implicit_gradients()`

Using `tfe.implicit_value_and_gradients()`

MiniQuark

Recent Activity

Donate For Us

What's the difference between GradientTape, implicit_gradients, gradients_function and implicit_value_and_gradients?

Tags:

tensorflow

Milad

People also ask

1 Answers

Detailed example

Using tf.GradientTape

Using tfe.gradients_function()

Using tfe.implicit_gradients()

Using tfe.implicit_value_and_gradients()

MiniQuark

Related questions

Recent Activity

Donate For Us

Using `tf.GradientTape`

Using `tfe.gradients_function()`

Using `tfe.implicit_gradients()`

Using `tfe.implicit_value_and_gradients()`