I have a network made with InceptionNet, and for an input sample bx
, I want to compute the gradients of the model output w.r.t. the hidden layer. I have the following code:
bx = tf.reshape(x_batch[0, :, :, :], (1, 299, 299, 3))
with tf.GradientTape() as gtape:
#gtape.watch(x)
preds = model(bx)
print(preds.shape, end=' ')
class_idx = np.argmax(preds[0])
print(class_idx, end=' ')
class_output = model.output[:, class_idx]
print(class_output, end=' ')
last_conv_layer = model.get_layer('inception_v3').get_layer('mixed10')
#gtape.watch(last_conv_layer)
print(last_conv_layer)
grads = gtape.gradient(class_output, last_conv_layer.output)#[0]
print(grads)
But, this will give None
. I tried gtape.watch(bx)
as well, but it still gives None
.
Before trying GradientTape, I tried using tf.keras.backend.gradient
but that gave an error as follows:
RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
My model is as follows:
model.summary()
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
inception_v3 (Model) (None, 1000) 23851784
_________________________________________________________________
dense_5 (Dense) (None, 2) 2002
=================================================================
Total params: 23,853,786
Trainable params: 23,819,354
Non-trainable params: 34,432
_________________________________________________________________
Any solution is appreciated. It doesn't have to be GradientTape, if there is any other way to compute these gradients.
TensorFlow "records" relevant operations executed inside the context of a tf. GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.
Eager execution is a powerful execution environment that evaluates operations immediately. It does not build graphs, and the operations return actual values instead of computational graphs to run later. With Eager execution, TensorFlow calculates the values of tensors as they occur in your code.
Behind the scenes, TensorFlow is a tensor library with automatic differentiation capability. Hence you can easily use it to solve a numerical optimization problem with gradient descent. In this post, you will learn how TensorFlow's automatic differentiation engine, autograd, works.
I had the same problem as you. I'm not sure if this is the cleanest way to solve the problem, but here's my solution.
I think the problem is that you need to pass along the actual return value of last_conv_layer.call(...)
as an argument to tape.watch()
. Since all layers are called sequentially within the scope of the model(bx)
call, you'll have to somehow inject some code into this inner scope. I did this using the following decorator:
def watch_layer(layer, tape):
"""
Make an intermediate hidden `layer` watchable by the `tape`.
After calling this function, you can obtain the gradient with
respect to the output of the `layer` by calling:
grads = tape.gradient(..., layer.result)
"""
def decorator(func):
def wrapper(*args, **kwargs):
# Store the result of `layer.call` internally.
layer.result = func(*args, **kwargs)
# From this point onwards, watch this tensor.
tape.watch(layer.result)
# Return the result to continue with the forward pass.
return layer.result
return wrapper
layer.call = decorator(layer.call)
return layer
In your example, I believe the following should then work for you:
bx = tf.reshape(x_batch[0, :, :, :], (1, 299, 299, 3))
last_conv_layer = model.get_layer('inception_v3').get_layer('mixed10')
with tf.GradientTape() as gtape:
# Make the `last_conv_layer` watchable
watch_layer(last_conv_layer, gtape)
preds = model(bx)
class_idx = np.argmax(preds[0])
class_output = model.output[:, class_idx]
# Get the gradient w.r.t. the output of `last_conv_layer`
grads = gtape.gradient(class_output, last_conv_layer.result)
print(grads)
You can use the tape to compute the gradient of an output node, wrt a set of watchable objects. By default, trainable variables are watchable by the tape, and you can access the trainable variables of a specific layer by getting it by name and accessing to the trainable_variables
property.
E.g. in the code below, I compute the gradients of the prediction, only with respect to the variables of the first FC layer (name "fc1") considering any other variable a constant.
import tensorflow as tf
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(10, input_shape=(3,), name="fc1", activation="relu"),
tf.keras.layers.Dense(3, input_shape=(3,), name="fc2"),
]
)
inputs = tf.ones((1, 299, 299, 3))
with tf.GradientTape() as tape:
preds = model(inputs)
grads = tape.gradient(preds, model.get_layer("fc1").trainable_variables)
print(grads)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With