I am working through "Deep Learning for Coders with fastai & Pytorch". Chapter 4 introduces the autograd function from the PyTorch library on a trivial example.
x = tensor([3.,4.,10.]).requires_grad_()
def f(q): return sum(q**2)
y = f(x)
y.backward()
My question boils down to this: the result of y = f(x)
is tensor(125., grad_fn=AddBackward0)
, but what does that even mean? Why would I sum the values of three completely different inputs?
I get that using .backward()
in this case is shorthand for .backward(tensor[1.,1.,1.])
in this scenario, but I don't see how summing 3 unrelated numbers in a list helps get the gradient for anything. What am I not understanding?
I'm not looking for a grad-level explanation here. The subtitle for the book I'm using is AI Applications Without a Ph.D. My experience with gradients is from school is that I should be getting a FUNCTION back, but I understand that isn't the case with Autograd. A graph of this short example would be helpful, but the ones I see online usually include too many parameters or weights and biases to be useful, my mind gets lost in the paths.
TLDR; the derivative of a sum of functions is the sum of their derivatives
Let x
be your input vector made of x_i
(where i
in [0,n]
), y = x**2
and L = sum(y_i)
. You are looking to compute dL/dx
, a vector of the same size as x
whose components are the dL/dx_j
(where j
in [0,n]
).
For j
in [0,n]
, dL/dx_j
is simply dy_j/dx_j
(derivative of the sum is the sum of derivates and only one of them is different to zero), which is d(x_j**2)/dx_j
, i.e. 2*x_j
. Therefore, dL/dx = [2*x_j where j in [0,n]]
.
This is the result you get in x.grad
when either computing the gradient of x
as:
y = f(x)
y.backward()
or the gradient of each components of x
separately:
y = x**2
y.backward(torch.ones_like(x))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With