I am attempting to implement automatic differentiation for a Python statistics package (the problem formulation is similar to optimization problem formulations).
The computational graph is generated using operator overloading and factory functions for operations like sum(), exp(), etc. I have implemented automatic differentiation for the gradient using reverse accumulation. However, I have found implementing automatic differentiation for the second derivative (the Hessian) much more difficult. I know how to do the individual 2nd partial gradient calculations, but I have had trouble coming up with an intelligent way to traverse the graph and do the accumulations. Does anyone know of good articles that give algorithms for automatic differentiation for the second derivative or open source libraries that implement the same that I may try to learn from?
PyTorch computes the gradient of a function with respect to the inputs by using automatic differentiation. Automatic differentiation is a technique that, given a computational graph, calculates the gradients of the inputs. Automatic differentiation can be performed in two different ways; forward and reverse mode.
What Is Automatic Differentiation? Automatic differentiation (also known as autodiff, AD, or algorithmic differentiation) is a widely used tool for deep learning. It is particularly useful for creating and training complex deep learning models without needing to compute derivatives manually for optimization.
What is Autograd? Autograd is a python package that can provide us with a way to differentiate Numpy and Python code. It is a library for gradient-based optimization. Using this package we can work with a large subset of features of python including loops, ifs, recursion, and closures.
We present auto_diff, a package that performs automatic differentiation of numerical Python code. auto_diff overrides Python's NumPy package's functions, augmenting them with seamless automatic differentiation capabilities.
First you must decide if you want o calculate a sparse Hessian or something closer to a fully dense Hessian.
If sparse is what you want, there are currently two competitive ways of doing this. Only using the computational graph in a clever way, one reverse sweep of the computational graph you can calculate the Hessian matrix using the edge_pushing algorithm:
http://www.tandfonline.com/doi/full/10.1080/10556788.2011.580098
Or you can try graph colouring techniques to compact your Hessian matrix into a matrix of fewer columns, then use reverse accumulation to calculate each column
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.66.2603
If what you want is a dense Hessian (unusual in practice) then your probably better of calculating one column of the Hessian at a time using reverse accumulation (search for BRUCE CHRISTIANSON and reverse accumulation)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With