I am using pytorch and trying to understand how a simple linear regression model works.
I'm using a simple LinearRegressionModel class:
class LinearRegressionModel(nn.Module):
def __init__(self, input_dim, output_dim):
super(LinearRegressionModel, self).__init__()
self.linear = nn.Linear(input_dim, output_dim)
def forward(self, x):
out = self.linear(x)
return out
model = LinearRegressionModel(1, 1)
Next I instantiate a loss criterion and an optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
Finally to train the model I use the following code:
for epoch in range(epochs):
if torch.cuda.is_available():
inputs = Variable(torch.from_numpy(x_train).cuda())
if torch.cuda.is_available():
labels = Variable(torch.from_numpy(y_train).cuda())
# Clear gradients w.r.t. parameters
optimizer.zero_grad()
# Forward to get output
outputs = model(inputs)
# Calculate Loss
loss = criterion(outputs, labels)
# Getting gradients w.r.t. parameters
loss.backward()
# Updating parameters
optimizer.step()
My question is how does the optimizer get the loss gradient, computed by loss.backward()
, to update the parameters using the step()
method? How are the model, the loss criterion and the optimizer tied together?
PyTorch has this concept of tensors and variables. When you use nn.Linear the function creates 2 variables namely W and b.In pytorch a variable is a wrapper that encapsulates a tensor , its gradient and information about its create function. you can directly access the gradients by
w.grad
When you try it before calling the loss.backward() you get None. Once you call the loss.backward() it will contain now gradients. Now you can update these gradient manually with the below simple steps.
w.data -= learning_rate * w.grad.data
When you have a complex network ,the above simple step could grow complex. So optimisers like SGD , Adam takes care of this. When you create the object for these optimisers we pass in the parameters of our model. nn.Module contains this parameters() function which will return all the learnable parameters to the optimiser. Which can be done using the below step.
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
loss.backward()
calculates the gradients and store them in the parameters. And you pass in the paremeters that are needed to be tuned here:
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With