Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell: can't understand the bottleneck

I solved a Project Euler problem and then confronted my solution with the one on the Haskell wiki. They were pretty similar, but mine was taking 7.5 seconds, while the other 0.6! I compiled them both.

Mine looks as follows:

main = print . maximumBy (compare `on` cycleLength) $ [1..999]
        where cycleLength d = remainders d 10 []

and the one of the wiki:

main = print . fst $ maximumBy (comparing snd) [(n, cycleLength n) | n <- [1..999]]
        where cycleLength d = remainders d 10 []

I also tried to change compare `on` with comparing cycleLength but the performance remained the same.
So I must conclude that all the difference lays in computing values on the fly vs. doing the transformation in the list comprehension.

The difference in time is pretty huge though: the second version has 12.5x speedup!

like image 457
rubik Avatar asked Jul 16 '14 06:07

rubik


1 Answers

The maximumBy function will repeatedly check the same number in your list multiple times — and every time it checks a number, it will have to re-compute cycleLength. And that's an expensive operation!

The wiki algorithm thus uses a technique known as decorate-sort-undecorate. Now, here you're not sorting, but it's close enough. You first precompute the cycleLength values for all numbers, (i.e. you make a 'cache') then you do the maximum operation, and then you undecorate them (using fst.) That way, you save yourself a lot of computing!

EDIT: to illustrate it, have a look at the maximumBy function in the Data.List source:

-- | The 'maximumBy' function takes a comparison function and a list
-- and returns the greatest element of the list by the comparison function.
-- The list must be finite and non-empty.
maximumBy               :: (a -> a -> Ordering) -> [a] -> a
maximumBy _ []          =  error "List.maximumBy: empty list"
maximumBy cmp xs        =  foldl1 maxBy xs
                        where
                           maxBy x y = case cmp x y of
                                      GT -> x
                                      _  -> y

It moves in a window of 2; every number is requested (and in your case computed) twice. This means that for 999 iterations, your version will call cycleLength d 1996 times (n*2-2), whereas the wiki version would call it 999 (n) times.

This doesn't explain the full delay — only a factor of 2, but the factor was closer to about 10.

Here's the profile of your version,

COST CENTRE entries %time %alloc %time %alloc MAIN 0 0.0 0.0 100.0 100.0 CAF 0 0.0 0.0 100.0 100.0 main 1 0.0 0.0 100.0 100.0 f 1 0.0 0.0 100.0 100.0 maximumBy 1 0.0 0.0 100.0 99.9 maximumBy.maxBy 998 0.0 0.1 100.0 99.9 cycleLength 1996 0.1 0.2 100.0 99.8 remainders 581323 99.3 94.4 99.9 99.7 remainders.r' 581294 0.7 5.2 0.7 5.2

and the wiki version:

COST CENTRE entries %time %alloc %time %alloc MAIN 0 0.0 0.0 100.0 100.0 CAF 0 0.0 0.0 100.0 99.9 main 1 0.0 0.1 100.0 99.9 f' 1 0.0 0.8 100.0 99.8 cycleLength 999 0.2 0.5 100.0 98.6 remainders 95845 98.3 93.0 99.8 98.2 remainders.r' 95817 1.5 5.2 1.5 5.2 maximumBy 1 0.0 0.1 0.0 0.4 maximumBy.maxBy 998 0.0 0.2 0.0 0.2

Looking at the profile here, it seems that your version goes through a lot more allocations (around 10-12 times as many,) but doesn't use a lot more RAM overall. So we need to explain about a cumulative factor of 5 or 6 in terms of allocations.

Remainders is recursive. In your example, it gets called 581294 times. In the wiki example, it gets called 95817 times. There's our 5-6 fold increase!

So I think the compare call here is also a problem. Since it applies cycleLength to both things we want to compare as well! In the wiki problem, cycleLength gets applied to every number, but here, we have it applied to every number twice, and the comparison seems to be applied more often, and that is a problem especially with the bigger numbers, since remainders has a bad complexity (it seems exponential, but I'm not sure.)

Since the maximum memory consumption of both programs isn't that dramatically different, I don't think this has anything to do with the heap.

like image 83
Aleksandar Dimitrov Avatar answered Oct 15 '22 00:10

Aleksandar Dimitrov