Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply gradient clipping in TensorFlow?

Considering the example code.

I would like to know How to apply gradient clipping on this network on the RNN where there is a possibility of exploding gradients.

tf.clip_by_value(t, clip_value_min, clip_value_max, name=None) 

This is an example that could be used but where do I introduce this ? In the def of RNN

    lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)     # Split data because rnn cell needs a list of inputs for the RNN inner loop     _X = tf.split(0, n_steps, _X) # n_steps tf.clip_by_value(_X, -1, 1, name=None) 

But this doesn't make sense as the tensor _X is the input and not the grad what is to be clipped?

Do I have to define my own Optimizer for this or is there a simpler option?

like image 953
Arsenal Fanatic Avatar asked Apr 08 '16 11:04

Arsenal Fanatic


People also ask

How do you apply gradient clippings?

Gradient Clipping basically helps in case of exploding or vanishing gradients. Say your loss is too high which will result in exponential gradients to flow through the network which may result in Nan values . To overcome this we clip gradients within a specific range (-1 to 1 or any range as per condition) .

How does gradient clipping work?

With gradient clipping, pre-determined gradient threshold be introduced, and then gradients norms that exceed this threshold are scaled down to match the norm. This prevents any gradient to have norm greater than the threshold and thus the gradients are clipped.

How does gradient work in TensorFlow?

Gradient tapes TensorFlow "records" relevant operations executed inside the context of a tf. GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.

What is Clip_grad_norm_?

nn. utils. clip_grad_norm_ performs gradient clipping. It is used to mitigate the problem of exploding gradients, which is of particular concern for recurrent networks (which LSTMs are a type of).


1 Answers

Gradient clipping needs to happen after computing the gradients, but before applying them to update the model's parameters. In your example, both of those things are handled by the AdamOptimizer.minimize() method.

In order to clip your gradients you'll need to explicitly compute, clip, and apply them as described in this section in TensorFlow's API documentation. Specifically you'll need to substitute the call to the minimize() method with something like the following:

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) gvs = optimizer.compute_gradients(cost) capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs] train_op = optimizer.apply_gradients(capped_gvs) 
like image 58
Styrke Avatar answered Sep 19 '22 00:09

Styrke