I have a trained network. And I want to calculate the gradients of outputs w.r.t. the inputs. By querying the PyTorch Docs, torch.autograd.grad may be useful. So, I use the following code:
x_test = torch.randn(D_in,requires_grad=True)
y_test = model(x_test)
d = torch.autograd.grad(y_test, x_test)[0]
model
is the neural network. x_test
is the input of size D_in
and y_test
is a scalar output.
I want to compare the calculated result with numerical difference by scipy.misc.derivative
.
So, I calculated the partial derivate by setting a index.
idx = 3
x_test = torch.randn(D_in,requires_grad=True)
y_test = model(x_test)
print(x_test[idx].item())
d = torch.autograd.grad(y_test, x_test)[0]
print(d[idx].item())
def fun(x):
x_input = x_test.detach()
x_input[idx] = x
with torch.no_grad():
y = model(x_input)
return y.item()
x0 = x_test[idx].item()
print(x0)
print(derivative(fun, x0, dx=1e-6))
But I got totally different results.
The gradient calculated by torch.autograd.grad
is -0.009522666223347187
,
while that by scipy.misc.derivative
is -0.014901161193847656
.
Is there anything wrong about the calculation? Or I use torch.autograd.grad
wrongly?
Gradients are calculated by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule .
The gradient is used to find the derivatives of the function. In mathematical terms, derivatives mean differentiation of a function partially and finding the value. Below is the diagram of how to calculate the derivative of a function.
Autograd is reverse automatic differentiation system. Conceptually, autograd records a graph recording all of the operations that created the data as you execute operations, giving you a directed acyclic graph whose leaves are the input tensors and roots are the output tensors.
In fact, it is very likely that your given code is completely correct. Let me explain this by redirecting you to a little background information on backpropagation, or rather in this case Automatic Differentiation (AutoDiff).
The specific implementation of many packages is based on AutoGrad, a common technique to get the exact derivatives of a function/graph. It can do this by essentially "inverting" the forward computational pass to compute piece-wise derivatives of atomic function blocks, like addition, subtraction, multiplication, division, etc., and then "chaining them together".
I explained AutoDiff and its specifics in a more detailed answer in this question.
On the contrary, scipy's derivative function is only an approximation to this derivative by using finite differences. You would take the results of the function at close-by points, and then calculate a derivative based on the difference in function values for those points. This is why you see a slight difference in the two gradients, since this can be an inaccurate representation of the actual derivative.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With