Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

requires_grad relation to leaf nodes

Tags:

pytorch

torch

From the docs:

requires_grad – Boolean indicating whether the Variable has been created by a subgraph containing any Variable, that requires it. Can be changed only on leaf Variables

  1. What does it mean by leaf nodes here? Are leaf nodes only the input nodes?
  2. If it can be only changed at the leaf nodes, how can I freeze layers then?
like image 995
Abhishek Bhatia Avatar asked Jul 04 '17 20:07

Abhishek Bhatia


People also ask

What does Requires_grad mean?

requires_grad indicates whether a variable is trainable. By default, requires_grad is False in creating a Variable. If one of the input to an operation requires gradient, its output and its subgraphs will also require gradient.

What does Requires_grad mean in PyTorch?

Setting requires_grad Parameter , that allows for fine-grained exclusion of subgraphs from gradient computation. It takes effect in both the forward and backward passes: During the forward pass, an operation is only recorded in the backward graph if at least one of its input tensors require grad.

What is Requires_grad true?

requires_grad_ (requires_grad=True) → Tensor. Change if autograd should record operations on this tensor: sets this tensor's requires_grad attribute in-place. Returns this tensor. requires_grad_() 's main use case is to tell autograd to begin recording operations on a Tensor tensor .

What is leaf node in PyTorch?

In PyTorch leaf nodes are therefore the values from which the computation begins. Here a simple program illustrating this: # The following two values are the leaf nodes x=T. ones(10, requires_grad=True) y=T. ones(10, requires_grad=True) # The remaining nodes are not leaves: def H(z1, z2): return T.


1 Answers

  1. Leaf nodes of a graph are those nodes (i.e. Variables) that were not computed directly from other nodes in the graph. For example:

    import torch
    from torch.autograd import Variable
    
    A = Variable(torch.randn(10,10)) # this is a leaf node
    B = 2 * A # this is not a leaf node
    w = Variable(torch.randn(10,10)) # this is a leaf node
    C = A.mm(w) # this is not a leaf node
    

    If a leaf node requires_grad, all subsequent nodes computed from it will automatically also require_grad. Else, you could not apply the chain rule to calculate the gradient of the leaf node which requires_grad. This is the reason why requires_grad can only be set for leaf nodes: For all others, it can be smartly inferred and is in fact determined by the settings of the leaf nodes used for computing these other variables.

  2. Note that in a typical neural network, all parameters are leaf nodes. They are not computed from any other Variables in the network. Hence, freezing layers using requires_gradis simple. Here, is an example taken from the PyTorch docs:

    model = torchvision.models.resnet18(pretrained=True)
    for param in model.parameters():
        param.requires_grad = False
    
    # Replace the last fully-connected layer
    # Parameters of newly constructed modules have requires_grad=True by default
    model.fc = nn.Linear(512, 100)
    
    # Optimize only the classifier
    optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)
    

    Even though, what you really do is freezing the entire gradient computation (which is what you should be doing as it avoids unnecessary computation). Technically, you could leave the requires_grad flag on, and only define your optimizer for a subset of the parameters that you would like to learn.

like image 64
mbpaulus Avatar answered Sep 22 '22 13:09

mbpaulus