Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In tensorflow what is the difference between trainable and stop gradient

I would like to know the difference between the option trainable=False and the tf.stop_gradient(). If I make the trainable option False will my optimizer not consider the variable for training? Does this option make the it a constant value throughout the training?

like image 658
pratsbhatt Avatar asked Aug 10 '17 11:08

pratsbhatt


People also ask

What is a stop gradient?

stop_gradient() is an operation that acts as the identity function in the forward direction but stops the accumulated gradient from flowing through that operator in the backward direction.

What are trainable variables in TensorFlow?

From my understanding, trainable means that the value could be changed during sess.run() That is not the definition of a trainable variable. Any variable can be modified during a sess. run() (That's why they are variables and not constants).

What are gradients in TensorFlow?

The gradients are the partial derivatives of the loss with respect to each of the six variables. TensorFlow presents the gradient and the variable of which it is the gradient, as members of a tuple inside a list. We display the shapes of each of the gradients and variables to check that is actually the case.


1 Answers

trainable=False

Here the variable value will be constant throughout the training. Optimizer won't consider this variable for training, no gradient update op.

stop_gradient

In certain situations, you want to calculate the gradient of a op with respect to some variable keeping a few other variables constant; but for other ops you may use those variables also to calculate gradient. So here you can't use trinable=False, as you need those variable for training with other ops.

stop_gradient is very useful for ops; you can selectively optimize a op with respect to select few variables while keeping other constant.

y1 = tf.stop_gradient(W1x+b1)
y2 = W2y1+b2
cost = cost_function(y2, y)
# this following op wont optimize the cost with respect to W1 and b1
train_op_w2_b2 = tf.train.MomentumOptimizer(0.001, 0.9).minimize(cost)

W1 = tf.get_variable('w1', trainable=False)
y1 = W1x+b1
y2 = W2y1+b2
cost = cost_function(y2, y)
# this following op wont optimize the cost with respect to W1
train_op = tf.train.MomentumOptimizer(0.001, 0.9).minimize(cost)
like image 108
Ishant Mrinal Avatar answered Sep 30 '22 18:09

Ishant Mrinal