Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Higher order gradients in pytorch

I have implemented the following Jacobian function in pytorch. Unless I have made a mistake, it computes the Jacobian of any tensor w.r.t. any dimensional inputs:

import torch
import torch.autograd as ag

def nd_range(stop, dims = None):
    if dims == None:
        dims = len(stop)
    if not dims:
        yield ()
        return
    for outer in nd_range(stop, dims - 1):
        for inner in range(stop[dims - 1]):
            yield outer + (inner,)


def full_jacobian(f, wrt):    
    f_shape = list(f.size())
    wrt_shape = list(wrt.size())
    fs = []


    f_range = nd_range(f_shape)
    wrt_range = nd_range(wrt_shape)

    for f_ind in f_range:
        grad = ag.grad(f[tuple(f_ind)], wrt, retain_graph=True, create_graph=True)[0]
        for i in range(len(f_shape)):
            grad = grad.unsqueeze(0)
        fs.append(grad)

    fj = torch.cat(fs, dim=0)
    fj = fj.view(f_shape + wrt_shape)
    return fj

On top of this, I have tried to implement a recursive function to calculate nth order derivatives:

def nth_derivative(f, wrt, n):
    if n == 1:
        return full_jacobian(f, wrt)
    else:        
        deriv = nth_derivative(f, wrt, n-1)
        return full_jacobian(deriv, wrt)

I ran a simple test:

op = torch.ger(s, s)
deep_deriv = nth_derivative(op, s, 5)

Unfortunately, this succeeds in getting me the Hessian...but no higher order derivatives. I'm aware many higher order derivatives should be 0, but I'd prefer if pytorch can analytically compute that.

One fix has been to change the gradient calculation to:

try:
            grad = ag.grad(f[tuple(f_ind)], wrt, retain_graph=True, create_graph=True)[0]
        except:
            grad = torch.zeros_like(wrt)

Is this the accepted correct way to handle this? Or is there a better option? Or do I have the reason for my issue completely wrong to begin with?

like image 823
user650261 Avatar asked May 14 '18 03:05

user650261


People also ask

Does PyTorch use Autograd?

torch. autograd is PyTorch's automatic differentiation engine that powers neural network training.

How does Autograd in PyTorch work?

Autograd is reverse automatic differentiation system. Conceptually, autograd records a graph recording all of the operations that created the data as you execute operations, giving you a directed acyclic graph whose leaves are the input tensors and roots are the output tensors.

What is CTX PyTorch?

ctx is a context object that can be used to stash information for backward computation. You can cache arbitrary objects for use in the backward pass using the ctx.save_for_backward method. """

What are gradients in PyTorch?

The gradient is used to find the derivatives of the function. In mathematical terms, derivatives mean differentiation of a function partially and finding the value. Below is the diagram of how to calculate the derivative of a function.


2 Answers

You can just iterate calling the grad function:

import torch
from torch.autograd import grad

def nth_derivative(f, wrt, n):

    for i in range(n):

        grads = grad(f, wrt, create_graph=True)[0]
        f = grads.sum()

    return grads

x = torch.arange(4, requires_grad=True).reshape(2, 2)
loss = (x ** 4).sum()

print(nth_derivative(f=loss, wrt=x, n=3))

outputs

tensor([[  0.,  24.],
        [ 48.,  72.]])
like image 109
Alex Avatar answered Sep 23 '22 14:09

Alex


For the second order derivative, you can use PyTorch's hessian function:

torch.autograd.functional.hessian()

For higher order derivatives, you can repeatedly call jacobian or grad while maintaining the computational graph:

create_graph (bool, optional) – If True, graph of the derivative will be constructed, allowing to compute higher order derivative products.

like image 43
iacob Avatar answered Sep 22 '22 14:09

iacob