Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell: list/vector/array performance tuning

I am trying out Haskell to compute partition functions of models in statistical physics. This involves traversing quite large lists of configurations and summing various observables - which I would like to do as efficiently as possible.

The current version of my code is here: https://gist.github.com/2420539

Some strange things happen when trying to choose between lists and vectors to enumerate the configurations; in particular, to truncate the list, using V.toList . V.take (3^n) . V.fromList (where V is Data.Vector) is faster than just using take, which feels a bit counter-intuitive. In both cases the list is evaluated lazily.

The list itself is built using iterate; if instead I use Vectors as much as possible and build the list by using V.iterateN, again it becomes slower ...

My question is, is there a way (other than splicing V.toList and V.fromList at random places in the code) to predict which one will be the quickest? (BTW, I compile everything using ghc -O2 with the current stable version.)

like image 360
Vincent Beffara Avatar asked Apr 19 '12 12:04

Vincent Beffara


1 Answers

Vectors are strict, and have O(1) subsets (e.g. take). They also have an optimized insert and delete. So you will sometimes see performance improvements by switching data structures on the fly. However, it is usually the wrong approach -- keeping all data in either one form or the other is better. (And you're using UArrays as well -- further confusing the issue).

General rules:

  • If the data is large and being transformed only in bulk fashion, using a dense, efficient structures like vectors make sense.

  • If the data is small, and traversed linearly, rarely, then lists make sense.

Remember that operations on lists and vectors have different complexity, so while iterate . replicate on lists is O(n), but lazy, the same on vectors will not necessarily be as efficient (you should prefer the built in methods in vector to generate arrays).

Generally, vectors should always be better for numerical operations. It might be that you have to use different functions that you do in lists.

I would stick to vectors only. Avoid UArrays, and avoid lists except as generators.

like image 163
Don Stewart Avatar answered Nov 04 '22 10:11

Don Stewart