How to use stop_gradient in Tensorflow

Tags:

tensorflow

I'm wondering how to use stop_gradient in tensorflow, and the documentation is not clear to me.

I'm currently using stop_gradient to produce the gradient of the loss function w.r.t. the word embeddings in a CBOW word2vec model. I want to just get the value, and not do backpropagation (as I'm generating adversarial examples).

Currently, I'm using the code:

lossGrad = gradients.gradients(loss, embed)[0] real_grad = lossGrad.eval(feed_dict)

~~But when I run this, it does the backpropogation anyway!~~ What am I doing wrong, and just as importantly, how can I fix this?

CLARIFICATION: To clarify by "backpropagation" I mean "calculating values and updating model parameters".

UPDATE

If I run the two lines above after the first training step, the I get a different loss after 100 training steps than when I don't run those two lines. I might be fundamentally misunderstanding something about Tensorflow.

I've tried setting using set_random_seed both in the beginning of the graph declaration and before each training step. The total loss is consistent between multiple runs, but not between including/excluding those two lines. So if it's not the RNG causing the disparity, and it's not unanticipated updating of the model parameters between training steps, do you have any idea what would cause this behavior?

SOLUTION

Welp, it's a bit late but here's how I solved it. I only wanted to optimize over some, but not all, variables. I thought that the way to prevent optimizing some variables would be to use stop_grad - but I never found a way to make that work. Maybe there is a way, but what worked for me was to adjust my optimizer to only optimize over a list of variables. So instead of:

opt = tf.train.GradientDescentOptimizer(learning_rate=eta) train_op = opt.minimize(loss)

I used:

opt = tf.train.GradientDescentOptimizer(learning_rate=eta) train_op = opt.minimize(loss, var_list=[variables to optimize over])

This prevented opt from updating the variables not in var_list. Hopefully it works for you, too!

566

asked Nov 16 '15 03:11

Alex Sax

2 Answers

tf.stop_gradient provides a way to not compute gradient with respect to some variables during back-propagation.

For example, in the code below, we have three variables, w1, w2, w3 and input x. The loss is square((x1.dot(w1) - x.dot(w2 * w3))). We want to minimize this loss wrt to w1 but want to keep w2 and w3 fixed. To achieve this we can just put tf.stop_gradient(tf.matmul(x, w2*w3)).

In the figure below, I plotted how w1, w2, and w3 from their initial values as the function of training iterations. It can be seen that w2 and w3 remain fixed while w1 changes until it becomes equal to w2 * w3.

An image showing that w1 only learns but not w2 and w3:

An image showing that w1 only learns but not w2 and w3

import tensorflow as tf import numpy as np  w1 = tf.get_variable("w1", shape=[5, 1], initializer=tf.truncated_normal_initializer()) w2 = tf.get_variable("w2", shape=[5, 1], initializer=tf.truncated_normal_initializer()) w3 = tf.get_variable("w3", shape=[5, 1], initializer=tf.truncated_normal_initializer()) x = tf.placeholder(tf.float32, shape=[None, 5], name="x")   a1 = tf.matmul(x, w1) a2 = tf.matmul(x, w2*w3) a2 = tf.stop_gradient(a2) loss = tf.reduce_mean(tf.square(a1 - a2)) optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1) gradients = optimizer.compute_gradients(loss) train_op = optimizer.apply_gradients(gradients)

answered Sep 22 '22 13:09

Abhishek Mishra

tf. gradients(loss, embed) computes the partial derivative of the tensor loss with respect to the tensor embed. TensorFlow computes this partial derivative by backpropagation, so it is expected behavior that evaluating the result of tf. gradients(...) performs backpropagation. However, evaluating that tensor does not perform any variable updates, because the expression does not include any assignment operations.

tf.stop_gradient() is an operation that acts as the identity function in the forward direction but stops the accumulated gradient from flowing through that operator in the backward direction. It does not prevent backpropagation altogether, but instead prevents an individual tensor from contributing to the gradients that are computed for an expression. The documentation for the operation has more details about the operation, and when to use it.

answered Sep 23 '22 13:09

mrry

Related questions
                            
                                Why is the accuracy for my Keras model always 0 when training?
                            
                                Where is the folder for Installing tensorflow with pip, Mac OSX?
                            
                                CUDA_ERROR_OUT_OF_MEMORY in tensorflow
                            
                                What does TensorFlow's `conv2d_transpose()` operation do?
                            
                                tf.data.Dataset: how to get the dataset size (number of elements in a epoch)?
                            
                                Tensorflow._api.v2.train has no attribute 'AdamOptimizer'
                            
                                Difference between Keras model.save() and model.save_weights()?
                            
                                TensorBoard - Plot training and validation losses on the same graph?
                            
                                How can I visualize the weights(variables) in cnn in Tensorflow?
                            
                                How to downgrade tensorflow, multiple versions possible?
                            
                                TensorFlow: training on my own image
                            
                                Xcode version must be specified to use an Apple CROSSTOOL
                            
                                Is there an example on how to generate protobuf files holding trained TensorFlow graphs
                            
                                How do I get the gradient of the loss at a TensorFlow variable?
                            
                                Reset weights in Keras layer
                            
                                ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
                            
                                Tensorflow crashes with CUBLAS_STATUS_ALLOC_FAILED
                            
                                Tensorflow Data Adapter Error: ValueError: Failed to find data adapter that can handle input
                            
                                Best way to flatten a 2D tensor containing a vector in TensorFlow?
                            
                                Convert between NHWC and NCHW in TensorFlow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With