Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell - simple way to cache a function call

I have functions like:

millionsOfCombinations = [[a, b, c, d] | 
  a <- filter (...some filter...) someListOfAs, 
  b <- (...some other filter...) someListOfBs, 
  c <- someListOfCs, d <- someListOfDs]

aLotOfCombinationsOfCombinations = [[comb1, comb2, comb3] | 
  comb1 <- millionsOfCombinations, 
  comb2 <- millionsOfCombinations,
  comb3 <- someList,
  ...around 10 function calls to find if
    [comb1, comb2, comb3] is actually useful]

Evaluating millionsOfCombinations takes 40s. on a very fast workstation. Evaluating aLotOfCombinationsOfCombinations!!0 took 2 days :-(

How can I speed up this code? So far I've had 2 ideas - use a profiler. Tried running myapp +RTS -sstderr after compiling with GHC, but get a blank screen and don't want to wait days for it to finish.

2nd thought was to somehow cache millionsOfCombinations. Do I understand correctly that for each value in aLotOfCombinationsOfCombinations, millionsOfCombinations gets evaluated multiple times? If that is so, how can I cache the result? Obviously I've just started learning Haskell. I know there is a way to do call caching with a monad, but I still don't understand those things.

like image 353
Sara Darcy Avatar asked Aug 10 '11 20:08

Sara Darcy


2 Answers

Use the -fforce-recomp, -O2 and -fllvm flags

If you aren't already, be sure to use the above flags. I wouldn't normally mention it, but I've seen some questions recently that didn't know powerful optimization isn't a default.

Profile Your Code

The -sstderr flag isn't exactly profiling. When people say profiling they're usually talking about either heap profiling or time profiling via -prof and -auto-all flags.

Avoid Costly Primitives

If you need the entire list in memory (i.e. it isn't going to be optimized away) then consider unboxed vectors. If Int will do instead of Integer, consider that (but Integer is a reasonable default when you don't know!). Use worker/wrapping transforms at the right times. If you're leaning heavily on Data.Map, try using Data.HashMap from the unordered-containers library. This list can go on and on, but since you don't already have an intuition on where your computation time is going the profiling should come first!

like image 100
Thomas M. DuBuisson Avatar answered Oct 29 '22 18:10

Thomas M. DuBuisson


I think, that there is no way. Please notice, that the time to generate the list is growing with each list involved. So you get around 10000003 combinations to check, which indeed takes a lot of time. Caching the list ist possible but is unlikely to change anything, since new elements can be generated almost instantly. The only way is probably to change the algorithm.

If millionsOfCombinations is a constant (and not a function with arguments), it is cached automatically. Else, make it a constant by using a where clause:

aLotOfCombinationsOfCombinations = [[comb1, comb2, comb3] | 
  comb1 <- millionsOfCombinations, 
  comb2 <- millionsOfCombinations,
  comb3 <- someList,
  ...around 10 function calls to find if
    [comb1, comb2, comb3] is actually useful] where

  millionsOfCombinations = makeCombination xyz
like image 27
fuz Avatar answered Oct 29 '22 18:10

fuz