I can see what this code below from this video is trying to do. But the sum from y=torch.sum(x**2) confuses me. With sum operation, y becomes a tensor with one single value. As I understand .backward() as calculating derivatives, why would we want to use sum and reduce y to one value?
import pytorch
import matplotlib.pyplot as plt
x = torch.linspace(-10.0,10.0,10, requires_grad=True)
Y = x**2
y = torch.sum(x**2)
y.backward()
plt.plot(x.detach().numpy(), Y.detach().numpy(), label="Y")
plt.plot(x.detach().numpy(), x.grad.detach().numpy(), label="derivatives")
plt.legend()
You can only compute partial derivatives for a scalar function. What backwards() gives you is d loss/d parameter and you expect a single gradient value per parameter/variable.
Had your loss function been a vector function, i.e., mapping from multiple inputs to multiple outputs, you would have ended up with multiple gradients per parameter/variable.
Please see this answer for more information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With