Where is an explicit connection between the <code>optimizer</code> and the <code>loss</code>? How does the optimizer know where to get the gradients of the loss without a call liks this <code>optimizer.step(loss)</code>? -More context- When I minimize the loss, I didn't have to pass the gradients to the optimizer. <pre class="prettyprint"><code>loss.backward() # Back Propagation optimizer.step() # Gardient Descent </code></pre>

When you call <code>loss.backward()</code>, all it does is compute gradient of loss w.r.t all the parameters in loss that have <code>requires_grad = True</code> and store them in <code>parameter.grad</code> attribute for every parameter. <code>optimizer.step()</code> updates all the parameters based on <code>parameter.grad</code>

pytorch - connection between loss.backward() and optimizer.step()

Tags:

machine-learning

neural-network

pytorch

gradient-descent

Where is an explicit connection between the optimizer and the loss?

How does the optimizer know where to get the gradients of the loss without a call liks this optimizer.step(loss)?

-More context-

When I minimize the loss, I didn't have to pass the gradients to the optimizer.

loss.backward() # Back Propagation optimizer.step() # Gardient Descent

491

asked Dec 30 '18 06:12

aerin

2 Answers

Without delving too deep into the internals of pytorch, I can offer a simplistic answer:

Recall that when initializing optimizer you explicitly tell it what parameters (tensors) of the model it should be updating. The gradients are "stored" by the tensors themselves (they have a grad and a requires_grad attributes) once you call backward() on the loss. After computing the gradients for all tensors in the model, calling optimizer.step() makes the optimizer iterate over all parameters (tensors) it is supposed to update and use their internally stored grad to update their values.

More info on computational graphs and the additional "grad" information stored in pytorch tensors can be found in this answer.

Referencing the parameters by the optimizer can sometimes cause troubles, e.g., when the model is moved to GPU after initializing the optimizer. Make sure you are done setting up your model before constructing the optimizer. See this answer for more details.

136

answered Oct 15 '22 14:10

Shai

When you call loss.backward(), all it does is compute gradient of loss w.r.t all the parameters in loss that have requires_grad = True and store them in parameter.grad attribute for every parameter.

optimizer.step() updates all the parameters based on parameter.grad

answered Oct 15 '22 13:10

Ganesh

Related questions
                            
                                What are the major differences and benefits of Porter and Lancaster Stemming algorithms? [closed]
                            
                                Estimating the number of neurons and number of layers of an artificial neural network [closed]
                            
                                Extracting an information from web page by machine learning
                            
                                How to save final model using keras?
                            
                                Batch Normalization in Convolutional Neural Network
                            
                                What is inductive bias in machine learning? [closed]
                            
                                What is the relation between the number of Support Vectors and training data and classifiers performance? [closed]
                            
                                How to update the bias in neural network backpropagation?
                            
                                What's the difference between torch.stack() and torch.cat() functions?
                            
                                How to detect patterns in (electrocardiography) waves?
                            
                                How to write a confusion matrix in Python?
                            
                                How big should batch size and number of epochs be when fitting a model in Keras?
                            
                                What's the difference between a bidirectional LSTM and an LSTM?
                            
                                How do I find Wally with Python?
                            
                                TensorFlow, "'module' object has no attribute 'placeholder'"
                            
                                Unsupervised clustering with unknown number of clusters
                            
                                How to tell Keras stop training based on loss value?
                            
                                How can I build a model to distinguish tweets about Apple (Inc.) from tweets about apple (fruit)?
                            
                                How to implement the ReLU function in Numpy
                            
                                Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With