Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to apply gradients manually in pytorch

Starting to learn pytorch and was trying to do something very simple, trying to move a randomly initialized vector of size 5 to a target vector of value [1,2,3,4,5].

But my distance is not decreasing!! And my vector x just goes crazy. No idea what I am missing.

import torch
import numpy as np
from torch.autograd import Variable

# regress a vector to the goal vector [1,2,3,4,5]

dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

x = Variable(torch.rand(5).type(dtype), requires_grad=True)
target = Variable(torch.FloatTensor([1,2,3,4,5]).type(dtype), 
requires_grad=False)
distance = torch.mean(torch.pow((x - target), 2))

for i in range(100):
  distance.backward(retain_graph=True)
  x_grad = x.grad
  x.data.sub_(x_grad.data * 0.01)
like image 389
Evan Pu Avatar asked Mar 07 '18 14:03

Evan Pu


People also ask

How do you access PyTorch gradients?

The gradients are same as the partial derivatives. For example, in the function y = 2*x + 1, x is a tensor with requires_grad = True. We can compute the gradients using y. backward() function and the gradient can be accessed using x.

How does PyTorch gradient work?

PyTorch computes the gradient of a function with respect to the inputs by using automatic differentiation. Automatic differentiation is a technique that, given a computational graph, calculates the gradients of the inputs. Automatic differentiation can be performed in two different ways; forward and reverse mode.

How do you zero out gradients in PyTorch?

You can also use model. zero_grad() . This is the same as using optimizer. zero_grad() as long as all your model parameters are in that optimizer.

What is Torch No_grad ()?

The use of "with torch. no_grad()" is like a loop where every tensor inside the loop will have requires_grad set to False. It means any tensor with gradient currently attached with the current computational graph is now detached from the current graph.


1 Answers

There are two errors in your code that prevents you from getting the desired results.

The first error is that you should put the distance calculation in the loop. Because the distance is the loss in this case. So we have to monitor its change in each iteration.

The second error is that you should manually zero out the x.grad because pytorch won't zero out the grad in variable by default.

The following is an example code which works as expected:

import torch
import numpy as np
from torch.autograd import Variable
import matplotlib.pyplot as plt

# regress a vector to the goal vector [1,2,3,4,5]

dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

x = Variable(torch.rand(5).type(dtype), requires_grad=True)
target = Variable(torch.FloatTensor([1,2,3,4,5]).type(dtype), 
requires_grad=False)

lr = 0.01 # the learning rate

d = []
for i in range(1000):
  distance = torch.mean(torch.pow((x - target), 2))
  d.append(distance.data)
  distance.backward(retain_graph=True)

  x.data.sub_(lr * x.grad.data)
  x.grad.data.zero_()

print(x.data)

fig, ax = plt.subplots()
ax.plot(d)
ax.set_xlabel("iteration")
ax.set_ylabel("distance")
plt.show()

The following is the graph of distance w.r.t iteration

enter image description here

We can see that the model converges at about 600 iterations. If we set the learning rate to be higher (e.g, lr=0.1), the model will converge much faster (it takes about 60 iterations, see image below)

enter image description here

Now, x becomes something like the following

0.9878 1.9749 2.9624 3.9429 4.9292

which is pretty close to your target of [1, 2, 3, 4, 5].

like image 90
jdhao Avatar answered Oct 01 '22 04:10

jdhao