I have functions like:
millionsOfCombinations = [[a, b, c, d] |
a <- filter (...some filter...) someListOfAs,
b <- (...some other filter...) someListOfBs,
c <- someListOfCs, d <- someListOfDs]
aLotOfCombinationsOfCombinations = [[comb1, comb2, comb3] |
comb1 <- millionsOfCombinations,
comb2 <- millionsOfCombinations,
comb3 <- someList,
...around 10 function calls to find if
[comb1, comb2, comb3] is actually useful]
Evaluating millionsOfCombinations
takes 40s. on a very fast workstation. Evaluating aLotOfCombinationsOfCombinations
!!0 took 2 days :-(
How can I speed up this code? So far I've had 2 ideas - use a profiler. Tried running myapp +RTS -sstderr
after compiling with GHC, but get a blank screen and don't want to wait days for it to finish.
2nd thought was to somehow cache millionsOfCombinations
. Do I understand correctly that for each value in aLotOfCombinationsOfCombinations
, millionsOfCombinations
gets evaluated multiple times? If that is so, how can I cache the result? Obviously I've just started learning Haskell. I know there is a way to do call caching with a monad, but I still don't understand those things.
Use the -fforce-recomp
, -O2
and -fllvm
flags
If you aren't already, be sure to use the above flags. I wouldn't normally mention it, but I've seen some questions recently that didn't know powerful optimization isn't a default.
Profile Your Code
The -sstderr
flag isn't exactly profiling. When people say profiling they're usually talking about either heap profiling or time profiling via -prof
and -auto-all
flags.
Avoid Costly Primitives
If you need the entire list in memory (i.e. it isn't going to be optimized away) then consider unboxed vectors. If Int
will do instead of Integer
, consider that (but Integer is a reasonable default when you don't know!). Use worker/wrapping transforms at the right times. If you're leaning heavily on Data.Map
, try using Data.HashMap
from the unordered-containers library. This list can go on and on, but since you don't already have an intuition on where your computation time is going the profiling should come first!
I think, that there is no way. Please notice, that the time to generate the list is growing with each list involved. So you get around 10000003 combinations to check, which indeed takes a lot of time. Caching the list ist possible but is unlikely to change anything, since new elements can be generated almost instantly. The only way is probably to change the algorithm.
If millionsOfCombinations
is a constant (and not a function with arguments), it is cached automatically. Else, make it a constant by using a where clause:
aLotOfCombinationsOfCombinations = [[comb1, comb2, comb3] |
comb1 <- millionsOfCombinations,
comb2 <- millionsOfCombinations,
comb3 <- someList,
...around 10 function calls to find if
[comb1, comb2, comb3] is actually useful] where
millionsOfCombinations = makeCombination xyz
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With