How to get more performance out of automatic differentiation?

Tags:

I am having a hard time optimizing a program that is relying on ads conjugateGradientDescent function for most of it's work.

Basically my code is a translation of an old papers code that is written in Matlab and C. I have not measured it, but that code is running at several iterations per second. Mine is in the order of minutes per iteration ...

The code is available in this repositories:

https://github.com/fhaust/aer
https://github.com/fhaust/aer-utils

The code in question can be run by following these commands:

$ cd aer-utils
$ cabal sandbox init
$ cabal sandbox add-source ../aer
$ cabal run learngabors

Using GHCs profiling facilities I have confirmed that the descent is in fact the part that is taking most of the time:

Flamegraph of one iteration

(interactive version here: https://dl.dropboxusercontent.com/u/2359191/learngabors.svg)

-s is telling me that productivity is quite low:

Productivity  33.6% of total user, 33.6% of total elapsed

From what I have gathered there are two things that might lead to higher performance:

Unboxing: currently I use a custom matrix implementation (in src/Data/SimpleMat.hs). This was the only way I could get ad to work with matrices (see: How to do automatic differentiation on hmatrix?). My guess is that by using a matrix type like newtype Mat w h a = Mat (Unboxed.Vector a) would achieve better performance due to unboxing and fusion. I found some code that has ad instances for unboxed vectors, but up to now I haven't been able to use these with the conjugateGradientFunction.
Matrix derivatives: In an email I just can't find at the moment Edward mentions that it would be better to use Forward instances for matrix types instead of having matrices filled with Forward instances. I have a faint idea how to achieve that, but have yet to figure out how I'd implement it in terms of ads type classes.

This is probably a question that is too wide to be answered on SO, so if you are willing to help me out here, feel free to contact me on Github.

515

asked Jun 17 '15 10:06

fho

1 Answers

You are running into pretty much the worst-case scenario for the current ad library here.

FWIW- You won't be able to use the existing ad classes/types with "matrix/vector ad". It'd be a fairly large engineering effort, see https://github.com/ekmett/ad/issues/2

As for why you can't unbox: conjugateGradient requires the ability to use Kahn mode or two levels of forward mode on your functions. The former precludes it from working with unboxed vectors, as the data types carry syntax trees, and can't be unboxed. For various technical reasons I haven't figured out how to make it work with a fixed sized 'tape' like the standard Reverse mode.

I think the "right" answer here is for us to sit down and figure out how to get matrix/vector AD right and integrated into the package, but I confess I'm timesliced a bit too thinly right now to give it the attention it deserves.

If you get a chance to swing by #haskell-lens on irc.freenode.net I'd happy to talk about designs in this space and offer advice. Alex Lang has also been working on ad a lot and is often present there and may have ideas.

183

answered Oct 26 '22 23:10

Edward Kmett

Related questions
                            
                                Why can you reverse list with foldl, but not with foldr in Haskell
                            
                                Stack overflow in OCaml and F# but not in Haskell
                            
                                How to abstract over a "back and forth" transformation?
                            
                                Zip with default value instead of dropping values?
                            
                                How to type cast
                            
                                Qshow Segmentation Fault
                            
                                Don't enter gnuplot terminal
                            
                                yesod persistent postgresql complex record
                            
                                Is it possible to compare two types, if one is assignable from the other?
                            
                                Haskell List Comprehension Speed Inconsistencies
                            
                                Evaluating List to Weak Head Normal Form
                            
                                What is an explicit example of a monad without a monad transformer? [duplicate]
                            
                                "cookbook" for converting from QuickCheck1 to QuickCheck2?
                            
                                How to add fields that only cache something to ADT?
                            
                                Profiling Template Haskell
                            
                                How can I implement HATEOAS in Haskell?
                            
                                Coercible with GHC 7.10
                            
                                How practical is it to embed the core of a language with an effectful function space (like ML) into Haskell?
                            
                                Writing efficient iterative loop for ST monad

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get more performance out of automatic differentiation?

Tags:

haskell

automatic-differentiation

hmatrix

fho

People also ask

1 Answers

Edward Kmett

Recent Activity

Donate For Us