Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does pytorch compute the gradients for a simple linear regression model?

I am using pytorch and trying to understand how a simple linear regression model works.

I'm using a simple LinearRegressionModel class:

class LinearRegressionModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearRegressionModel, self).__init__()
        self.linear = nn.Linear(input_dim, output_dim)  

    def forward(self, x):
        out = self.linear(x)
        return out

model = LinearRegressionModel(1, 1)

Next I instantiate a loss criterion and an optimizer

criterion = nn.MSELoss()

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

Finally to train the model I use the following code:

for epoch in range(epochs):
    if torch.cuda.is_available():
        inputs = Variable(torch.from_numpy(x_train).cuda())

    if torch.cuda.is_available():
        labels = Variable(torch.from_numpy(y_train).cuda())

    # Clear gradients w.r.t. parameters
    optimizer.zero_grad() 

    # Forward to get output
    outputs = model(inputs)

    # Calculate Loss
    loss = criterion(outputs, labels)

    # Getting gradients w.r.t. parameters
    loss.backward()

    # Updating parameters
    optimizer.step()

My question is how does the optimizer get the loss gradient, computed by loss.backward(), to update the parameters using the step() method? How are the model, the loss criterion and the optimizer tied together?

like image 554
Dimitris Poulopoulos Avatar asked Mar 09 '23 01:03

Dimitris Poulopoulos


2 Answers

PyTorch has this concept of tensors and variables. When you use nn.Linear the function creates 2 variables namely W and b.In pytorch a variable is a wrapper that encapsulates a tensor , its gradient and information about its create function. you can directly access the gradients by

w.grad

When you try it before calling the loss.backward() you get None. Once you call the loss.backward() it will contain now gradients. Now you can update these gradient manually with the below simple steps.

w.data -= learning_rate * w.grad.data

When you have a complex network ,the above simple step could grow complex. So optimisers like SGD , Adam takes care of this. When you create the object for these optimisers we pass in the parameters of our model. nn.Module contains this parameters() function which will return all the learnable parameters to the optimiser. Which can be done using the below step.

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
like image 151
Vishnu Subramanian Avatar answered Mar 10 '23 13:03

Vishnu Subramanian


loss.backward()

calculates the gradients and store them in the parameters. And you pass in the paremeters that are needed to be tuned here:

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
like image 38
catethos Avatar answered Mar 10 '23 13:03

catethos