Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyTorch Gradient Descent

Tags:

python

pytorch

I am trying to manually implement gradient descent in PyTorch as a learning exercise. I have the following to create my synthetic dataset:

import torch
torch.manual_seed(0)
N = 100
x = torch.rand(N,1)*5
# Let the following command be the true function
y = 2.3 + 5.1*x
# Get some noisy observations
y_obs = y + 2*torch.randn(N,1)

Then I create my predictive function (y_pred) as shown below.

w = torch.randn(1, requires_grad=True)
b = torch.randn(1, requires_grad=True)
y_pred = w*x+b
mse = torch.mean((y_pred-y_obs)**2)

which uses MSE to infer the weights w,b. I use the block below to update the values according to the gradient.

gamma = 1e-2
for i in range(100):
  w = w - gamma *w.grad
  b = b - gamma *b.grad
  mse.backward()

However, the loop only works in the first iteration. The second iteration onwards w.grad is set to None. I am fairly sure the reason this happens is because I am setting w as a function of it self (I might be wrong).

The question is how do I update the weights properly with the gradient information?

like image 742
sachinruk Avatar asked Sep 06 '18 23:09

sachinruk


People also ask

Does PyTorch use gradient descent?

Introduction to Tensors MPoL is built on PyTorch, and uses a form of gradient descent optimization to find the “best” image given a dataset and choice of regularizers. We'll start this tutorial by importing the torch and numpy packages.

How does gradient work in PyTorch?

PyTorch computes the gradient of a function with respect to the inputs by using automatic differentiation. Automatic differentiation is a technique that, given a computational graph, calculates the gradients of the inputs. Automatic differentiation can be performed in two different ways; forward and reverse mode.

Is Adam better than SGD?

Adam is well known to perform worse than SGD for image classification tasks [22]. For our experiment, we tuned the learning rate and could only get an accuracy of 71.16%. In comparison, Adam-LAWN achieves an accuracy of more than 76%, marginally surpassing the performance of SGD-LAWN and SGD.


1 Answers

  1. You should call the backward method before you apply the gradient descent.
  2. You need to use the new weight to calculate the loss every iteration.
  3. Create new tensor without gradient tape every iteration.

The following code works fine on my computer and gives w=5.1 & b=2.2 after 500 iterations training.

Code:

import torch
torch.manual_seed(0)
N = 100
x = torch.rand(N,1)*5
# Let the following command be the true function
y = 2.3 + 5.1*x
# Get some noisy observations
y_obs = y + 0.2*torch.randn(N,1)

w = torch.randn(1, requires_grad=True)
b = torch.randn(1, requires_grad=True)


gamma = 0.01
for i in range(500):
    print(i)
    # use new weight to calculate loss
    y_pred = w * x + b
    mse = torch.mean((y_pred - y_obs) ** 2)

    # backward
    mse.backward()
    print('w:', w)
    print('b:', b)
    print('w.grad:', w.grad)
    print('b.grad:', b.grad)

    # gradient descent, don't track
    with torch.no_grad():
        w = w - gamma * w.grad
        b = b - gamma * b.grad
    w.requires_grad = True
    b.requires_grad = True

Output:

499
w: tensor([5.1095], requires_grad=True)
b: tensor([2.2474], requires_grad=True)
w.grad: tensor([0.0179])
b.grad: tensor([-0.0576])
like image 99
Robert Avatar answered Oct 29 '22 17:10

Robert