I am trying to manually implement gradient descent in PyTorch as a learning exercise. I have the following to create my synthetic dataset:
import torch
torch.manual_seed(0)
N = 100
x = torch.rand(N,1)*5
# Let the following command be the true function
y = 2.3 + 5.1*x
# Get some noisy observations
y_obs = y + 2*torch.randn(N,1)
Then I create my predictive function (y_pred
) as shown below.
w = torch.randn(1, requires_grad=True)
b = torch.randn(1, requires_grad=True)
y_pred = w*x+b
mse = torch.mean((y_pred-y_obs)**2)
which uses MSE to infer the weights w,b
. I use the block below to update the values according to the gradient.
gamma = 1e-2
for i in range(100):
w = w - gamma *w.grad
b = b - gamma *b.grad
mse.backward()
However, the loop only works in the first iteration. The second iteration onwards w.grad
is set to None
. I am fairly sure the reason this happens is because I am setting w as a function of it self (I might be wrong).
The question is how do I update the weights properly with the gradient information?
Introduction to Tensors MPoL is built on PyTorch, and uses a form of gradient descent optimization to find the “best” image given a dataset and choice of regularizers. We'll start this tutorial by importing the torch and numpy packages.
PyTorch computes the gradient of a function with respect to the inputs by using automatic differentiation. Automatic differentiation is a technique that, given a computational graph, calculates the gradients of the inputs. Automatic differentiation can be performed in two different ways; forward and reverse mode.
Adam is well known to perform worse than SGD for image classification tasks [22]. For our experiment, we tuned the learning rate and could only get an accuracy of 71.16%. In comparison, Adam-LAWN achieves an accuracy of more than 76%, marginally surpassing the performance of SGD-LAWN and SGD.
The following code works fine on my computer and gives w=5.1 & b=2.2 after 500 iterations training.
Code:
import torch
torch.manual_seed(0)
N = 100
x = torch.rand(N,1)*5
# Let the following command be the true function
y = 2.3 + 5.1*x
# Get some noisy observations
y_obs = y + 0.2*torch.randn(N,1)
w = torch.randn(1, requires_grad=True)
b = torch.randn(1, requires_grad=True)
gamma = 0.01
for i in range(500):
print(i)
# use new weight to calculate loss
y_pred = w * x + b
mse = torch.mean((y_pred - y_obs) ** 2)
# backward
mse.backward()
print('w:', w)
print('b:', b)
print('w.grad:', w.grad)
print('b.grad:', b.grad)
# gradient descent, don't track
with torch.no_grad():
w = w - gamma * w.grad
b = b - gamma * b.grad
w.requires_grad = True
b.requires_grad = True
Output:
499
w: tensor([5.1095], requires_grad=True)
b: tensor([2.2474], requires_grad=True)
w.grad: tensor([0.0179])
b.grad: tensor([-0.0576])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With