I would like to replace or modify the gradient of an op or portion of the graph in tensorflow. It would be ideal if I can use the existing gradient in the calculation.
In some ways this is the opposite to what tf.stop_gradient()
does: instead of adding a calculation which is ignored when calculating gradients, I want a calculation which is only used when calculating gradients.
A simple example would be something which simply scales gradients by multiplying them with a constant (but does not multiply the forward calculation by a constant). Another example would be something which clips the gradients to a given range.
Behind the scenes, TensorFlow is a tensor library with automatic differentiation capability. Hence you can easily use it to solve a numerical optimization problem with gradient descent. In this post, you will learn how TensorFlow's automatic differentiation engine, autograd, works.
Applying gradient clipping in TensorFlow models is quite straightforward. The only thing you need to do is pass the parameter to the optimizer function. All optimizers have a `clipnorm` and a `clipvalue` parameters that can be used to clip the gradients.
TensorFlow "records" relevant operations executed inside the context of a tf. GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.
stop_gradient() is an operation that acts as the identity function in the forward direction but stops the accumulated gradient from flowing through that operator in the backward direction.
For TensorFlow 1.7 and TensorFlow 2.0 look at edit blow.
First define your custom gradient:
@tf.RegisterGradient("CustomGrad") def _const_mul_grad(unused_op, grad): return 5.0 * grad
Since you want nothing to happen in the forward pass, override the gradient of an identity operation with your new gradient:
g = tf.get_default_graph() with g.gradient_override_map({"Identity": "CustomGrad"}): output = tf.identity(input, name="Identity")
Here is a working example with a layer that clips gradients in the backwards pass and does nothing in the forwards pass, using the same method:
import tensorflow as tf @tf.RegisterGradient("CustomClipGrad") def _clip_grad(unused_op, grad): return tf.clip_by_value(grad, -0.1, 0.1) input = tf.Variable([3.0], dtype=tf.float32) g = tf.get_default_graph() with g.gradient_override_map({"Identity": "CustomClipGrad"}): output_clip = tf.identity(input, name="Identity") grad_clip = tf.gradients(output_clip, input) # output without gradient clipping in the backwards pass for comparison: output = tf.identity(input) grad = tf.gradients(output, input) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) print("with clipping:", sess.run(grad_clip)[0]) print("without clipping:", sess.run(grad)[0])
Edit for TensorFlow 1.7 and TensorFlow 2.0
Since 1.7 there is a new way to redefine the gradient with shorter syntax, which also works with Tensorflow 2.0. It also allows to redefine the gradient of multiple operations at the same time. Here are the examples from above, rewritten for TensorFlow 1.7 and TensorFlow 2.0:
Layer that scales gradients in the backward pass:
@tf.custom_gradient def scale_grad_layer(x): def grad(dy): return 5.0 * dy return tf.identity(x), grad
Example with a layer that clips gradients in the backward pass:
@tf.custom_gradient def clip_grad_layer(x): def grad(dy): return tf.clip_by_value(dy, -0.1, 0.1) return tf.identity(x), grad
Assuming the forward computation is
y = f(x)
And you want it to backpropagate like
y = b(x)
A simple hack will be:
y = b(x) + tf.stop_gradient(f(x) - b(x))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With