Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do we need to explicitly call zero_grad()? [duplicate]

Why do we need to explicitly zero the gradients in PyTorch? Why can't gradients be zeroed when loss.backward() is called? What scenario is served by keeping the gradients on the graph and asking the user to explicitly zero the gradients?

like image 841
Wasi Ahmad Avatar asked Jun 24 '17 02:06

Wasi Ahmad


People also ask

Why do we need to call Zero_grad () in PyTorch?

zero_grad() restarts looping without losses from the last step if you use the gradient method for decreasing the error (or losses). If you do not use zero_grad() the loss will increase not decrease as required.

Why do you need to do Optimizer Zero_grad () in neural net model in PyTorch?

This is to ensure that we aren't tracking any unnecessary information when we train our neural network. You can also use model. zero_grad() .

What does model Zero_grad () do?

zero_grad() to make sure all grads are zero, e.g. if you have two or more optimizers for one model.

What does Loss backward () do?

Loss Function MSELoss which computes the mean-squared error between the input and the target. So, when we call loss. backward() , the whole graph is differentiated w.r.t. the loss, and all Variables in the graph will have their . grad Variable accumulated with the gradient.


Video Answer


1 Answers

We explicitly need to call zero_grad() because, after loss.backward() (when gradients are computed), we need to use optimizer.step() to proceed gradient descent. More specifically, the gradients are not automatically zeroed because these two operations, loss.backward() and optimizer.step(), are separated, and optimizer.step() requires the just computed gradients.

In addition, sometimes, we need to accumulate gradient among some batches; to do that, we can simply call backward multiple times and optimize once.

like image 53
danche Avatar answered Oct 12 '22 14:10

danche