Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In TensorFlow 2.0 with eager-execution, how to compute the gradients of a network output wrt a specific layer?

I have a network made with InceptionNet, and for an input sample bx, I want to compute the gradients of the model output w.r.t. the hidden layer. I have the following code:

bx = tf.reshape(x_batch[0, :, :, :], (1, 299, 299, 3))


with tf.GradientTape() as gtape:
    #gtape.watch(x)
    preds = model(bx)
    print(preds.shape, end='  ')

    class_idx = np.argmax(preds[0])
    print(class_idx, end='   ')

    class_output = model.output[:, class_idx]
    print(class_output, end='   ')

    last_conv_layer = model.get_layer('inception_v3').get_layer('mixed10')
    #gtape.watch(last_conv_layer)
    print(last_conv_layer)


grads = gtape.gradient(class_output, last_conv_layer.output)#[0]
print(grads)

But, this will give None. I tried gtape.watch(bx) as well, but it still gives None.

Before trying GradientTape, I tried using tf.keras.backend.gradient but that gave an error as follows:

RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.

My model is as follows:

model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
inception_v3 (Model)         (None, 1000)              23851784  
_________________________________________________________________
dense_5 (Dense)              (None, 2)                 2002      
=================================================================
Total params: 23,853,786
Trainable params: 23,819,354
Non-trainable params: 34,432
_________________________________________________________________

Any solution is appreciated. It doesn't have to be GradientTape, if there is any other way to compute these gradients.

like image 209
Vahid Mirjalili Avatar asked Jun 06 '19 13:06

Vahid Mirjalili


People also ask

Can TensorFlow compute gradients?

TensorFlow "records" relevant operations executed inside the context of a tf. GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.

What is eager execution TF?

Eager execution is a powerful execution environment that evaluates operations immediately. It does not build graphs, and the operations return actual values instead of computational graphs to run later. With Eager execution, TensorFlow calculates the values of tensors as they occur in your code.

Does TensorFlow use Autograd?

Behind the scenes, TensorFlow is a tensor library with automatic differentiation capability. Hence you can easily use it to solve a numerical optimization problem with gradient descent. In this post, you will learn how TensorFlow's automatic differentiation engine, autograd, works.


2 Answers

I had the same problem as you. I'm not sure if this is the cleanest way to solve the problem, but here's my solution.

I think the problem is that you need to pass along the actual return value of last_conv_layer.call(...) as an argument to tape.watch(). Since all layers are called sequentially within the scope of the model(bx) call, you'll have to somehow inject some code into this inner scope. I did this using the following decorator:

def watch_layer(layer, tape):
    """
    Make an intermediate hidden `layer` watchable by the `tape`.
    After calling this function, you can obtain the gradient with
    respect to the output of the `layer` by calling:

        grads = tape.gradient(..., layer.result)

    """
    def decorator(func):
        def wrapper(*args, **kwargs):
            # Store the result of `layer.call` internally.
            layer.result = func(*args, **kwargs)
            # From this point onwards, watch this tensor.
            tape.watch(layer.result)
            # Return the result to continue with the forward pass.
            return layer.result
        return wrapper
    layer.call = decorator(layer.call)
    return layer

In your example, I believe the following should then work for you:

bx = tf.reshape(x_batch[0, :, :, :], (1, 299, 299, 3))
last_conv_layer = model.get_layer('inception_v3').get_layer('mixed10')
with tf.GradientTape() as gtape:
    # Make the `last_conv_layer` watchable
    watch_layer(last_conv_layer, gtape)  
    preds = model(bx)
    class_idx = np.argmax(preds[0])
    class_output = model.output[:, class_idx]
# Get the gradient w.r.t. the output of `last_conv_layer`
grads = gtape.gradient(class_output, last_conv_layer.result)  
print(grads)
like image 71
Fantasty Avatar answered Oct 04 '22 11:10

Fantasty


You can use the tape to compute the gradient of an output node, wrt a set of watchable objects. By default, trainable variables are watchable by the tape, and you can access the trainable variables of a specific layer by getting it by name and accessing to the trainable_variables property.

E.g. in the code below, I compute the gradients of the prediction, only with respect to the variables of the first FC layer (name "fc1") considering any other variable a constant.

import tensorflow as tf

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Dense(10, input_shape=(3,), name="fc1", activation="relu"),
        tf.keras.layers.Dense(3, input_shape=(3,), name="fc2"),
    ]
)

inputs = tf.ones((1, 299, 299, 3))

with tf.GradientTape() as tape:
    preds = model(inputs)

grads = tape.gradient(preds, model.get_layer("fc1").trainable_variables)
print(grads)
like image 44
nessuno Avatar answered Oct 04 '22 11:10

nessuno