From the docs: <blockquote> requires_grad – Boolean indicating whether the Variable has been created by a subgraph containing any Variable, that requires it. Can be changed only on leaf Variables </blockquote> <ol> <li>What does it mean by leaf nodes here? Are leaf nodes only the input nodes?</li> <li>If it can be only changed at the leaf nodes, how can I freeze layers then?</li> </ol>

<ol> <li> Leaf nodes of a graph are those nodes (i.e. <code>Variables</code>) that were not computed directly from other nodes in the graph. For example: <pre class="prettyprint"><code>import torch from torch.autograd import Variable A = Variable(torch.randn(10,10)) # this is a leaf node B = 2 * A # this is not a leaf node w = Variable(torch.randn(10,10)) # this is a leaf node C = A.mm(w) # this is not a leaf node </code></pre> If a leaf node <code>requires_grad</code>, all subsequent nodes computed from it will automatically also <code>require_grad</code>. Else, you could not apply the chain rule to calculate the gradient of the leaf node which <code>requires_grad</code>. This is the reason why <code>requires_grad</code> can only be set for leaf nodes: For all others, it can be smartly inferred and is in fact determined by the settings of the leaf nodes used for computing these other variables. </li> <li> Note that in a typical neural network, all parameters are leaf nodes. They are not computed from any other <code>Variables</code> in the network. Hence, freezing layers using <code>requires_grad</code>is simple. Here, is an example taken from the PyTorch docs: <pre class="prettyprint"><code>model = torchvision.models.resnet18(pretrained=True) for param in model.parameters(): param.requires_grad = False # Replace the last fully-connected layer # Parameters of newly constructed modules have requires_grad=True by default model.fc = nn.Linear(512, 100) # Optimize only the classifier optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9) </code></pre> Even though, what you really do is freezing the entire gradient computation (which is what you should be doing as it avoids unnecessary computation). Technically, you could leave the <code>requires_grad</code> flag on, and only define your optimizer for a subset of the parameters that you would like to learn. </li> </ol>

requires_grad relation to leaf nodes

1 Answers

Leaf nodes of a graph are those nodes (i.e. Variables) that were not computed directly from other nodes in the graph. For example:
```
import torch
from torch.autograd import Variable

A = Variable(torch.randn(10,10)) # this is a leaf node
B = 2 * A # this is not a leaf node
w = Variable(torch.randn(10,10)) # this is a leaf node
C = A.mm(w) # this is not a leaf node
```
If a leaf node requires_grad, all subsequent nodes computed from it will automatically also require_grad. Else, you could not apply the chain rule to calculate the gradient of the leaf node which requires_grad. This is the reason why requires_grad can only be set for leaf nodes: For all others, it can be smartly inferred and is in fact determined by the settings of the leaf nodes used for computing these other variables.
Note that in a typical neural network, all parameters are leaf nodes. They are not computed from any other Variables in the network. Hence, freezing layers using requires_gradis simple. Here, is an example taken from the PyTorch docs:
```
model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
    param.requires_grad = False

# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)

# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)
```
Even though, what you really do is freezing the entire gradient computation (which is what you should be doing as it avoids unnecessary computation). Technically, you could leave the requires_grad flag on, and only define your optimizer for a subset of the parameters that you would like to learn.

answered Sep 22 '22 13:09

mbpaulus

Related questions
                            
                                pytorch custom layer "is not a Module subclass"
                            
                                In Pytorch, is there a difference between (x<0) and x.lt(0)?
                            
                                How to asynchronously load and train batches to train a DeepLearning model?
                            
                                Is it possible to split the training DataLoader (and dataset) into training and validation datasets?
                            
                                Anaconda reading wrong CUDA version
                            
                                Fast way to initialize a tensor in torch7
                            
                                How do you use PyTorch PackedSequence in code?
                            
                                AttributeError: 'tuple' object has no attribute 'dim', when feeding input to Pytorch LSTM network
                            
                                How to convert a torch tensor into a byte string?
                            
                                How to use pack_padded_sequence with multiple variable-length input with the same label in pytorch
                            
                                How to sort a tensor by first dimension in pytorch?
                            
                                HTTP Server Using Lua/ Torch7
                            
                                Getting ImportError when using torchtext
                            
                                Torch tensor equivalent function to matlab's "find"?
                            
                                Why is PyTorch called PyTorch? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

requires_grad relation to leaf nodes

Tags:

pytorch

torch

Abhishek Bhatia

People also ask

1 Answers

mbpaulus

Recent Activity

Donate For Us