Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Pytorch autograd need a scalar?

I am working through "Deep Learning for Coders with fastai & Pytorch". Chapter 4 introduces the autograd function from the PyTorch library on a trivial example.

x = tensor([3.,4.,10.]).requires_grad_()
def f(q): return sum(q**2)
y = f(x)
y.backward()

My question boils down to this: the result of y = f(x) is tensor(125., grad_fn=AddBackward0), but what does that even mean? Why would I sum the values of three completely different inputs?

I get that using .backward() in this case is shorthand for .backward(tensor[1.,1.,1.]) in this scenario, but I don't see how summing 3 unrelated numbers in a list helps get the gradient for anything. What am I not understanding?

I'm not looking for a grad-level explanation here. The subtitle for the book I'm using is AI Applications Without a Ph.D. My experience with gradients is from school is that I should be getting a FUNCTION back, but I understand that isn't the case with Autograd. A graph of this short example would be helpful, but the ones I see online usually include too many parameters or weights and biases to be useful, my mind gets lost in the paths.

like image 347
Mack Avatar asked Jul 26 '21 21:07

Mack


1 Answers

TLDR; the derivative of a sum of functions is the sum of their derivatives

Let x be your input vector made of x_i (where i in [0,n]), y = x**2 and L = sum(y_i). You are looking to compute dL/dx, a vector of the same size as x whose components are the dL/dx_j (where j in [0,n]).

For j in [0,n], dL/dx_j is simply dy_j/dx_j (derivative of the sum is the sum of derivates and only one of them is different to zero), which is d(x_j**2)/dx_j, i.e. 2*x_j. Therefore, dL/dx = [2*x_j where j in [0,n]].

This is the result you get in x.grad when either computing the gradient of x as:

y = f(x)
y.backward()

or the gradient of each components of x separately:

y = x**2
y.backward(torch.ones_like(x))
like image 149
Ivan Avatar answered Oct 28 '22 21:10

Ivan