Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pytorch - connection between loss.backward() and optimizer.step()

Where is an explicit connection between the optimizer and the loss?

How does the optimizer know where to get the gradients of the loss without a call liks this optimizer.step(loss)?

-More context-

When I minimize the loss, I didn't have to pass the gradients to the optimizer.

loss.backward() # Back Propagation optimizer.step() # Gardient Descent 
like image 491
aerin Avatar asked Dec 30 '18 06:12

aerin


People also ask

What does Loss backward () do in PyTorch?

Loss Function MSELoss which computes the mean-squared error between the input and the target. So, when we call loss. backward() , the whole graph is differentiated w.r.t. the loss, and all Variables in the graph will have their . grad Variable accumulated with the gradient.

What does Optimizer step () do in PyTorch?

After computing the gradients for all tensors in the model, calling optimizer. step() makes the optimizer iterate over all parameters (tensors) it is supposed to update and use their internally stored grad to update their values.

What is backward () in PyTorch?

What does backward() do in PyTorch? PyTorchServer Side ProgrammingProgramming. The backward() method is used to compute the gradient during the backward pass in a neural network. The gradients are computed when this method is executed. These gradients are stored in the respective variables.

What does Optimizer Zero_grad () do?

zero_grad. Sets the gradients of all optimized torch. Tensor s to zero.


2 Answers

Without delving too deep into the internals of pytorch, I can offer a simplistic answer:

Recall that when initializing optimizer you explicitly tell it what parameters (tensors) of the model it should be updating. The gradients are "stored" by the tensors themselves (they have a grad and a requires_grad attributes) once you call backward() on the loss. After computing the gradients for all tensors in the model, calling optimizer.step() makes the optimizer iterate over all parameters (tensors) it is supposed to update and use their internally stored grad to update their values.

More info on computational graphs and the additional "grad" information stored in pytorch tensors can be found in this answer.

Referencing the parameters by the optimizer can sometimes cause troubles, e.g., when the model is moved to GPU after initializing the optimizer. Make sure you are done setting up your model before constructing the optimizer. See this answer for more details.

like image 136
Shai Avatar answered Oct 15 '22 14:10

Shai


When you call loss.backward(), all it does is compute gradient of loss w.r.t all the parameters in loss that have requires_grad = True and store them in parameter.grad attribute for every parameter.

optimizer.step() updates all the parameters based on parameter.grad

like image 20
Ganesh Avatar answered Oct 15 '22 13:10

Ganesh