Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do gradient clipping in pytorch?

What is the correct way to perform gradient clipping in pytorch?

I have an exploding gradients problem.

like image 946
Gulzar Avatar asked Feb 15 '19 20:02

Gulzar


People also ask

What is gradient clipping in Pytorch?

Using gradient clipping you can prevent exploding gradients in neural networks. Gradient clipping limits the magnitude of the gradient. There are many ways to compute gradient clipping, but a common one is to rescale gradients so that their norm is at most a particular value.

How do you do gradient clippings?

Gradient clipping-by-norm The idea behind clipping-by-norm is similar to by-value. The difference is that we clip the gradients by multiplying the unit vector of the gradients with the threshold. where the threshold is a hyperparameter, g is the gradient, and ‖g‖ is the norm of g.

How do you clip gradients in TensorFlow?

Applying gradient clipping in TensorFlow models is quite straightforward. The only thing you need to do is pass the parameter to the optimizer function. All optimizers have a `clipnorm` and a `clipvalue` parameters that can be used to clip the gradients.

Is gradient clipping good?

Vanishing gradients can happen when optimization gets stuck at a certain point because the gradient is too small to progress. Gradient clipping can prevent these issues in the gradients that mess up the parameters during training.


2 Answers

A more complete example from here:

optimizer.zero_grad()        
loss, hidden = model(data, hidden, targets)
loss.backward()

torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)
optimizer.step()
like image 100
Rahul Avatar answered Oct 21 '22 19:10

Rahul


clip_grad_norm (which is actually deprecated in favor of clip_grad_norm_ following the more consistent syntax of a trailing _ when in-place modification is performed) clips the norm of the overall gradient by concatenating all parameters passed to the function, as can be seen from the documentation:

The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.

From your example it looks like that you want clip_grad_value_ instead which has a similar syntax and also modifies the gradients in-place:

clip_grad_value_(model.parameters(), clip_value)

Another option is to register a backward hook. This takes the current gradient as an input and may return a tensor which will be used in-place of the previous gradient, i.e. modifying it. This hook is called each time after a gradient has been computed, i.e. there's no need for manually clipping once the hook has been registered:

for p in model.parameters():
    p.register_hook(lambda grad: torch.clamp(grad, -clip_value, clip_value))
like image 38
a_guest Avatar answered Oct 21 '22 21:10

a_guest