What is the correct way to perform gradient clipping in pytorch? I have an exploding gradients problem.

<code>clip_grad_norm</code> (which is actually deprecated in favor of <code>clip_grad_norm_</code> following the more consistent syntax of a trailing <code>_</code> when in-place modification is performed) clips the norm of the overall gradient by concatenating all parameters passed to the function, as can be seen from the documentation: <blockquote> The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place. </blockquote> From your example it looks like that you want <code>clip_grad_value_</code> instead which has a similar syntax and also modifies the gradients in-place: <pre class="prettyprint"><code>clip_grad_value_(model.parameters(), clip_value) </code></pre> Another option is to register a backward hook. This takes the current gradient as an input and may return a tensor which will be used in-place of the previous gradient, i.e. modifying it. This hook is called each time after a gradient has been computed, i.e. there's no need for manually clipping once the hook has been registered: <pre class="prettyprint"><code>for p in model.parameters(): p.register_hook(lambda grad: torch.clamp(grad, -clip_value, clip_value)) </code></pre>

How to do gradient clipping in pytorch?

2 Answers

A more complete example from here:

Click to copy

optimizer.zero_grad()        
loss, hidden = model(data, hidden, targets)
loss.backward()

torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)
optimizer.step()

100

answered Oct 21 '22 19:10

Rahul

clip_grad_norm (which is actually deprecated in favor of clip_grad_norm_ following the more consistent syntax of a trailing _ when in-place modification is performed) clips the norm of the overall gradient by concatenating all parameters passed to the function, as can be seen from the documentation:

The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.

From your example it looks like that you want clip_grad_value_ instead which has a similar syntax and also modifies the gradients in-place:

Click to copy

clip_grad_value_(model.parameters(), clip_value)

Another option is to register a backward hook. This takes the current gradient as an input and may return a tensor which will be used in-place of the previous gradient, i.e. modifying it. This hook is called each time after a gradient has been computed, i.e. there's no need for manually clipping once the hook has been registered:

Click to copy

for p in model.parameters():
    p.register_hook(lambda grad: torch.clamp(grad, -clip_value, clip_value))

answered Oct 21 '22 21:10

a_guest

Related questions
                            
                                django-admin.py makemessages not working
                            
                                Split string into strings by length?
                            
                                What does numpy.gradient do?
                            
                                How to set Python language specific tab spacing in Visual Studio Code?
                            
                                How can I use Python for large scale development?
                            
                                'virtualenv' is not recognized as an internal or external command, operable program or batch file
                            
                                How can I produce a human readable difference when subtracting two UNIX timestamps using Python?
                            
                                How do I get the file / key size in boto S3?
                            
                                How can I remove extra whitespace from strings when parsing a csv file in Pandas?
                            
                                Why isn't the regular expression's "non-capturing" group working?
                            
                                Is there an easily available implementation of erf() for Python?
                            
                                Mocking async call in python 3.5
                            
                                How can you "clone" a conda environment into the root environment?
                            
                                python object() takes no parameters error [closed]
                            
                                Python-Requests close http connection
                            
                                Extracting a region from an image using slicing in Python, OpenCV
                            
                                Reading a text file and splitting it into single words in python
                            
                                How to save requests (python) cookies to a file?
                            
                                how to get access to error message from abort command when using custom error handler
                            
                                Removing duplicate characters from a string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to do gradient clipping in pytorch?

Tags:

python

machine-learning

deep-learning

pytorch

gradient-descent

Gulzar

People also ask

2 Answers

Rahul

a_guest

Recent Activity

Donate For Us