Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pytorch autograd.grad how to write the parameters for multiple outputs?

In the documentation of torch.autograd.grad, it is stated that, for parameters,

parameters:

outputs (sequence of Tensor) – outputs of the differentiated function.

inputs (sequence of Tensor) – Inputs w.r.t. which the gradient will be returned (and not accumulated into .grad).

I try the following:

a = torch.rand(2, requires_grad=True)
b = torch.rand(2, requires_grad=True)
c = a+b
d = a-b

torch.autograd.grad([c, d], [a, b]) #ValueError: only one element tensors can be converted to Python scalars
torch.autograd.grad(torch.tensor([c, d]), torch.tensor([a, b])) #RuntimeError: grad can be implicitly created only for scalar outputs

I would like to get gradients of a list of tensors w.r.t another list of tensors. What is the correct way to feed the parameters?

like image 339
Tengerye Avatar asked Oct 15 '22 11:10

Tengerye


2 Answers

As the torch.autograd.grad mentioned, torch.autograd.grad computes and returns the sum of gradients of outputs w.r.t. the inputs. Since your c and d are not scalar values, grad_outputs are required.

import torch

a = torch.rand(2,requires_grad=True)
b = torch.rand(2, requires_grad=True)

a
# tensor([0.2308, 0.2388], requires_grad=True)

b
# tensor([0.6314, 0.7867], requires_grad=True)

c = a*a + b*b
d = 2*a+4*b

torch.autograd.grad([c,d], inputs=[a,b], grad_outputs=[torch.Tensor([1.,1.]), torch.Tensor([1.,1.])])
# (tensor([2.4616, 2.4776]), tensor([5.2628, 5.5734]))

Explanation: dc/da = 2*a = [0.2308*2, 0.2388*2] dd/da = [2.,2.] So the first output is dc/da*grad_outputs[0]+dd/da*grad_outputs[1] = [2.4616, 2.4776]. Same calculation for the second output.

If you just want to get the gradient of c and d w.r.t. the inputs, probably you can do this:

a = torch.rand(2,requires_grad=True)
b = torch.rand(2, requires_grad=True)

a
# tensor([0.9566, 0.6066], requires_grad=True)
b
# tensor([0.5248, 0.4833], requires_grad=True)

c = a*a + b*b
d = 2*a+4*b

[torch.autograd.grad(t, inputs=[a,b], grad_outputs=[torch.Tensor([1.,1.])]) for t in [c,d]]
# [(tensor([1.9133, 1.2132]), tensor([1.0496, 0.9666])),
# (tensor([2., 2.]), tensor([4., 4.]))]
like image 97
zihaozhihao Avatar answered Oct 21 '22 00:10

zihaozhihao


Here you go In the example you gave:

a = torch.rand(2, requires_grad=True)
b = torch.rand(2, requires_grad=True)
loss = a + b

As the loss is a vector with 2 elements, you can't perform the autograd operation at once.

typically,

loss = torch.sum(a + b)
torch.autograd.grad([loss], [a, b])

This would return the correct value of gradient for the loss tensor which contains one element. You can pass mutiple scalar tensors to outputs argument of the torch.autograd.grad method

like image 34
tejas Avatar answered Oct 21 '22 01:10

tejas