Haskell

Question

I have functions like:

millionsOfCombinations = [[a, b, c, d] | 
  a <- filter (...some filter...) someListOfAs, 
  b <- (...some other filter...) someListOfBs, 
  c <- someListOfCs, d <- someListOfDs]

aLotOfCombinationsOfCombinations = [[comb1, comb2, comb3] | 
  comb1 <- millionsOfCombinations, 
  comb2 <- millionsOfCombinations,
  comb3 <- someList,
  ...around 10 function calls to find if
    [comb1, comb2, comb3] is actually useful]

Evaluating millionsOfCombinations takes 40s. on a very fast workstation. Evaluating aLotOfCombinationsOfCombinations!!0 took 2 days :-(

How can I speed up this code? So far I've had 2 ideas - use a profiler. Tried running myapp +RTS -sstderr after compiling with GHC, but get a blank screen and don't want to wait days for it to finish.

2nd thought was to somehow cache millionsOfCombinations. Do I understand correctly that for each value in aLotOfCombinationsOfCombinations, millionsOfCombinations gets evaluated multiple times? If that is so, how can I cache the result? Obviously I've just started learning Haskell. I know there is a way to do call caching with a monad, but I still don't understand those things.

Thomas M. DuBuisson · Accepted Answer

Use the -fforce-recomp, -O2 and -fllvm flags

If you aren't already, be sure to use the above flags. I wouldn't normally mention it, but I've seen some questions recently that didn't know powerful optimization isn't a default.

Profile Your Code

The -sstderr flag isn't exactly profiling. When people say profiling they're usually talking about either heap profiling or time profiling via -prof and -auto-all flags.

Avoid Costly Primitives

If you need the entire list in memory (i.e. it isn't going to be optimized away) then consider unboxed vectors. If Int will do instead of Integer, consider that (but Integer is a reasonable default when you don't know!). Use worker/wrapping transforms at the right times. If you're leaning heavily on Data.Map, try using Data.HashMap from the unordered-containers library. This list can go on and on, but since you don't already have an intuition on where your computation time is going the profiling should come first!

fuz · Answer

I think, that there is no way. Please notice, that the time to generate the list is growing with each list involved. So you get around 1000000³ combinations to check, which indeed takes a lot of time. Caching the list ist possible but is unlikely to change anything, since new elements can be generated almost instantly. The only way is probably to change the algorithm.

If millionsOfCombinations is a constant (and not a function with arguments), it is cached automatically. Else, make it a constant by using a where clause:

aLotOfCombinationsOfCombinations = [[comb1, comb2, comb3] | 
  comb1 <- millionsOfCombinations, 
  comb2 <- millionsOfCombinations,
  comb3 <- someList,
  ...around 10 function calls to find if
    [comb1, comb2, comb3] is actually useful] where

  millionsOfCombinations = makeCombination xyz

Haskell - simple way to cache a function call

Tags:

performance

Sara Darcy

2 Answers

Thomas M. DuBuisson

fuz

Recent Activity

Donate For Us