Why do we need to explicitly zero the gradients in PyTorch? Why can't gradients be zeroed when loss.backward()
is called? What scenario is served by keeping the gradients on the graph and asking the user to explicitly zero the gradients?
zero_grad() restarts looping without losses from the last step if you use the gradient method for decreasing the error (or losses). If you do not use zero_grad() the loss will increase not decrease as required.
This is to ensure that we aren't tracking any unnecessary information when we train our neural network. You can also use model. zero_grad() .
zero_grad() to make sure all grads are zero, e.g. if you have two or more optimizers for one model.
Loss Function MSELoss which computes the mean-squared error between the input and the target. So, when we call loss. backward() , the whole graph is differentiated w.r.t. the loss, and all Variables in the graph will have their . grad Variable accumulated with the gradient.
We explicitly need to call zero_grad()
because, after loss.backward()
(when gradients are computed), we need to use optimizer.step()
to proceed gradient descent. More specifically, the gradients are not automatically zeroed because these two operations, loss.backward()
and optimizer.step()
, are separated, and optimizer.step()
requires the just computed gradients.
In addition, sometimes, we need to accumulate gradient among some batches; to do that, we can simply call backward
multiple times and optimize once.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With