Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Tensorflow Reshape tf.reshape() break the flow of gradients?

I am creating a tf.Variable() and then create a simple function using that variable, then I flatten the original variable using tf.reshape() and then I take the tf.gradients() between the function and the flattened variable. Why does that return [None].

var = tf.Variable(np.ones((5,5)), dtype = tf.float32)
f = tf.reduce_sum(tf.reduce_sum(tf.square(var)))
var_f = tf.reshape(var, [-1])
print tf.gradients(f,var_f)

The above codeblock when executed returns [None]. Is this a bug? Please Help!

like image 776
Sri Ramana Avatar asked Jun 30 '17 00:06

Sri Ramana


People also ask

What does TF reshape do?

tf. reshape(t, []) reshapes a tensor t with one element to a scalar.

How does gradient work in TensorFlow?

Gradient tapesTensorFlow "records" relevant operations executed inside the context of a tf. GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.

What is TF gradient?

gradients() adds ops to the graph to output the derivatives of ys with respect to xs . It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys and for x in xs . grad_ys is a list of tensors of the same length as ys that holds the initial gradients for each y in ys .


1 Answers

You are finding derivative of f with respect to var_f, but f is not a function of var_f but var instead. Thats why you are getting [None]. Now if you change the code to:

 var = tf.Variable(np.ones((5,5)), dtype = tf.float32)
 var_f = tf.reshape(var, [-1])
 f = tf.reduce_sum(tf.reduce_sum(tf.square(var_f)))
 grad = tf.gradients(f,var_f)
 print(grad)

your gradients will be defined:

tf.Tensor 'gradients_28/Square_32_grad/mul_1:0' shape=(25,) dtype=float32>

The visualization of the graphs for the following code is given below:

 var = tf.Variable(np.ones((5,5)), dtype = tf.float32, name='var')
 f = tf.reduce_sum(tf.reduce_sum(tf.square(var)), name='f')
 var_f = tf.reshape(var, [-1], name='var_f')
 grad_1 = tf.gradients(f,var_f, name='grad_1')
 grad_2 = tf.gradients(f,var, name='grad_2')

enter image description here

The derivative of grad_1 is not defined, while for grad_2 it's defined. The back-propagation graph (gradient graphs) of the two gradients are shown.

like image 120
vijay m Avatar answered Oct 22 '22 03:10

vijay m