I am having a hard time optimizing a program that is relying on ad
s conjugateGradientDescent
function for most of it's work.
Basically my code is a translation of an old papers code that is written in Matlab and C. I have not measured it, but that code is running at several iterations per second. Mine is in the order of minutes per iteration ...
The code is available in this repositories:
The code in question can be run by following these commands:
$ cd aer-utils
$ cabal sandbox init
$ cabal sandbox add-source ../aer
$ cabal run learngabors
Using GHCs profiling facilities I have confirmed that the descent is in fact the part that is taking most of the time:
(interactive version here: https://dl.dropboxusercontent.com/u/2359191/learngabors.svg)
-s
is telling me that productivity is quite low:
Productivity 33.6% of total user, 33.6% of total elapsed
From what I have gathered there are two things that might lead to higher performance:
Unboxing: currently I use a custom matrix implementation (in src/Data/SimpleMat.hs
). This was the only way I could get ad
to work with matrices (see: How to do automatic differentiation on hmatrix?). My guess is that by using a matrix type like newtype Mat w h a = Mat (Unboxed.Vector a)
would achieve better performance due to unboxing and fusion. I found some code that has ad
instances for unboxed vectors, but up to now I haven't been able to use these with the conjugateGradientFunction
.
Matrix derivatives: In an email I just can't find at the moment Edward mentions that it would be better to use Forward
instances for matrix types instead of having matrices filled with Forward
instances. I have a faint idea how to achieve that, but have yet to figure out how I'd implement it in terms of ad
s type classes.
This is probably a question that is too wide to be answered on SO, so if you are willing to help me out here, feel free to contact me on Github.
A key benefit of AD is that it frees the user of the machine learning methods from providing possibly cumbersome derivatives of the loss function or log posteriors, thereby enabling the training of various and custom machine learning models at scale.
Forward mode automatic differentiation is accomplished by augmenting the algebra of real numbers and obtaining a new arithmetic. An additional component is added to every number to represent the derivative of a function at the number, and all arithmetic operators are extended for the augmented algebra.
Reverse mode automatic differentiation uses an extension of the forward mode computational graph to enable the computation of a gradient by a reverse traversal of the graph. As the software runs the code to compute the function and its derivative, it records operations in a data structure called a trace.
Wengert list would be the tape describing the order in which operations were originally executed. There is also source code transformation based AD and a nice example of that system is Tangent. Nowadays almost no one uses tape (Wengert list) any more.
You are running into pretty much the worst-case scenario for the current ad
library here.
FWIW- You won't be able to use the existing ad
classes/types with "matrix/vector ad". It'd be a fairly large engineering effort, see https://github.com/ekmett/ad/issues/2
As for why you can't unbox: conjugateGradient
requires the ability to use Kahn
mode or two levels of forward mode on your functions. The former precludes it from working with unboxed vectors, as the data types carry syntax trees, and can't be unboxed. For various technical reasons I haven't figured out how to make it work with a fixed sized 'tape' like the standard Reverse
mode.
I think the "right" answer here is for us to sit down and figure out how to get matrix/vector AD right and integrated into the package, but I confess I'm timesliced a bit too thinly right now to give it the attention it deserves.
If you get a chance to swing by #haskell-lens on irc.freenode.net I'd happy to talk about designs in this space and offer advice. Alex Lang has also been working on ad
a lot and is often present there and may have ideas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With