Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use PyTorch to calculate the gradients of outputs w.r.t. the inputs in a neural network?

I have a trained network. And I want to calculate the gradients of outputs w.r.t. the inputs. By querying the PyTorch Docs, torch.autograd.grad may be useful. So, I use the following code:

    x_test = torch.randn(D_in,requires_grad=True)
    y_test = model(x_test)
    d = torch.autograd.grad(y_test, x_test)[0]

model is the neural network. x_test is the input of size D_in and y_test is a scalar output. I want to compare the calculated result with numerical difference by scipy.misc.derivative. So, I calculated the partial derivate by setting a index.

    idx = 3
    x_test = torch.randn(D_in,requires_grad=True)
    y_test = model(x_test)
    print(x_test[idx].item())
    d = torch.autograd.grad(y_test, x_test)[0]
    print(d[idx].item())
    def fun(x):
        x_input = x_test.detach()
        x_input[idx] = x
        with torch.no_grad():
            y = model(x_input)
        return y.item()
    x0 = x_test[idx].item()
    print(x0)
    print(derivative(fun, x0, dx=1e-6))

But I got totally different results. The gradient calculated by torch.autograd.grad is -0.009522666223347187, while that by scipy.misc.derivative is -0.014901161193847656.

Is there anything wrong about the calculation? Or I use torch.autograd.grad wrongly?

like image 444
SungSingSong Avatar asked Aug 03 '18 06:08

SungSingSong


People also ask

How are gradients calculated in PyTorch?

Gradients are calculated by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule .

What is a PyTorch gradient?

The gradient is used to find the derivatives of the function. In mathematical terms, derivatives mean differentiation of a function partially and finding the value. Below is the diagram of how to calculate the derivative of a function.

How does Autograd in PyTorch work?

Autograd is reverse automatic differentiation system. Conceptually, autograd records a graph recording all of the operations that created the data as you execute operations, giving you a directed acyclic graph whose leaves are the input tensors and roots are the output tensors.


1 Answers

In fact, it is very likely that your given code is completely correct. Let me explain this by redirecting you to a little background information on backpropagation, or rather in this case Automatic Differentiation (AutoDiff).

The specific implementation of many packages is based on AutoGrad, a common technique to get the exact derivatives of a function/graph. It can do this by essentially "inverting" the forward computational pass to compute piece-wise derivatives of atomic function blocks, like addition, subtraction, multiplication, division, etc., and then "chaining them together".
I explained AutoDiff and its specifics in a more detailed answer in this question.

On the contrary, scipy's derivative function is only an approximation to this derivative by using finite differences. You would take the results of the function at close-by points, and then calculate a derivative based on the difference in function values for those points. This is why you see a slight difference in the two gradients, since this can be an inaccurate representation of the actual derivative.

like image 108
dennlinger Avatar answered Oct 21 '22 12:10

dennlinger