Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow: why is zip() function used in the steps involving applying the gradients?

I am working through Assignment 6 of the Udacity Deep Learning course. I am unsure why the zip() function is used in these steps to apply the gradients.

Here is the relevant code:

# define the loss function
logits = tf.nn.xw_plus_b(tf.concat(0, outputs), w, b) 
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf.concat(0, train_labels)))

# Optimizer.

global_step = tf.Variable(0)
#staircase=True means that the learning_rate updates at discrete time steps
learning_rate = tf.train.exponential_decay(10.0, global_step, 5000, 0.1, staircase=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

gradients, v = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 1.25)
optimizer = optimizer.apply_gradients(zip(gradients, v), global_step=global_step)

What is the purpose of applying the zip() function?

Why is gradients and v stored that way? I thought zip(*iterable) returned just one zip object.

like image 276
Taivanbat Badamdorj Avatar asked Jul 25 '16 13:07

Taivanbat Badamdorj


People also ask

How does gradient work in TensorFlow?

Gradient tapesTensorFlow "records" relevant operations executed inside the context of a tf. GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.

How does TensorFlow compute derivatives?

Tensorflow calculates derivatives using automatic differentiation. This is different from symbolic differentiation and numeric differentiation (aka finite differences). More than a smart math approach, it is a smart programming approach.

What does the minimize function of Optimizer do?

Calling minimize() takes care of both computing the gradients and applying them to the variables. If you want to process the gradients before applying them you can instead use the optimizer in three steps: Compute the gradients with tf. GradientTape .


1 Answers

I don't know Tensorflow, but presumably optimizer.compute_gradients(loss) yields (gradient, value) tuples.

gradients, v = zip(*optimizer.compute_gradients(loss))

performs a transposition, creating a list of gradients and a list of values.

gradients, _ = tf.clip_by_global_norm(gradients, 1.25)

then clips the gradients, and

optimizer = optimizer.apply_gradients(zip(gradients, v), global_step=global_step)

re-zips the gradient and value lists back into an iterable of (gradient, value) tuples which is then passed to the optimizer.apply_gradients method.

like image 97
PM 2Ring Avatar answered Oct 12 '22 23:10

PM 2Ring