I am working through Assignment 6 of the Udacity Deep Learning course. I am unsure why the zip() function is used in these steps to apply the gradients.
Here is the relevant code:
# define the loss function
logits = tf.nn.xw_plus_b(tf.concat(0, outputs), w, b)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf.concat(0, train_labels)))
# Optimizer.
global_step = tf.Variable(0)
#staircase=True means that the learning_rate updates at discrete time steps
learning_rate = tf.train.exponential_decay(10.0, global_step, 5000, 0.1, staircase=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
gradients, v = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 1.25)
optimizer = optimizer.apply_gradients(zip(gradients, v), global_step=global_step)
What is the purpose of applying the zip()
function?
Why is gradients
and v
stored that way? I thought zip(*iterable)
returned just one zip object.
Gradient tapesTensorFlow "records" relevant operations executed inside the context of a tf. GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.
Tensorflow calculates derivatives using automatic differentiation. This is different from symbolic differentiation and numeric differentiation (aka finite differences). More than a smart math approach, it is a smart programming approach.
Calling minimize() takes care of both computing the gradients and applying them to the variables. If you want to process the gradients before applying them you can instead use the optimizer in three steps: Compute the gradients with tf. GradientTape .
I don't know Tensorflow, but presumably optimizer.compute_gradients(loss)
yields (gradient, value) tuples.
gradients, v = zip(*optimizer.compute_gradients(loss))
performs a transposition, creating a list of gradients and a list of values.
gradients, _ = tf.clip_by_global_norm(gradients, 1.25)
then clips the gradients, and
optimizer = optimizer.apply_gradients(zip(gradients, v), global_step=global_step)
re-zips the gradient and value lists back into an iterable of (gradient, value) tuples which is then passed to the optimizer.apply_gradients
method.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With