I solved a Project Euler problem and then confronted my solution with the one on the Haskell wiki. They were pretty similar, but mine was taking 7.5 seconds, while the other 0.6! I compiled them both.
Mine looks as follows:
main = print . maximumBy (compare `on` cycleLength) $ [1..999]
where cycleLength d = remainders d 10 []
and the one of the wiki:
main = print . fst $ maximumBy (comparing snd) [(n, cycleLength n) | n <- [1..999]]
where cycleLength d = remainders d 10 []
I also tried to change compare `on`
with comparing cycleLength
but the performance remained the same.
So I must conclude that all the difference lays in computing values on the fly vs. doing the transformation in the list comprehension.
The difference in time is pretty huge though: the second version has 12.5x speedup!
The maximumBy
function will repeatedly check the same number in your list multiple times — and every time it checks a number, it will have to re-compute cycleLength
. And that's an expensive operation!
The wiki algorithm thus uses a technique known as decorate-sort-undecorate. Now, here you're not sorting, but it's close enough. You first precompute the cycleLength
values for all numbers, (i.e. you make a 'cache') then you do the maximum operation, and then you undecorate them (using fst
.) That way, you save yourself a lot of computing!
EDIT: to illustrate it, have a look at the maximumBy
function in the Data.List
source:
-- | The 'maximumBy' function takes a comparison function and a list
-- and returns the greatest element of the list by the comparison function.
-- The list must be finite and non-empty.
maximumBy :: (a -> a -> Ordering) -> [a] -> a
maximumBy _ [] = error "List.maximumBy: empty list"
maximumBy cmp xs = foldl1 maxBy xs
where
maxBy x y = case cmp x y of
GT -> x
_ -> y
It moves in a window of 2; every number is requested (and in your case computed) twice.
This means that for 999 iterations, your version will call cycleLength d
1996 times (n*2-2), whereas the wiki version would call it 999 (n) times.
This doesn't explain the full delay — only a factor of 2, but the factor was closer to about 10.
Here's the profile of your version,
COST CENTRE entries %time %alloc %time %alloc
MAIN 0 0.0 0.0 100.0 100.0
CAF 0 0.0 0.0 100.0 100.0
main 1 0.0 0.0 100.0 100.0
f 1 0.0 0.0 100.0 100.0
maximumBy 1 0.0 0.0 100.0 99.9
maximumBy.maxBy 998 0.0 0.1 100.0 99.9
cycleLength 1996 0.1 0.2 100.0 99.8
remainders 581323 99.3 94.4 99.9 99.7
remainders.r' 581294 0.7 5.2 0.7 5.2
and the wiki version:
COST CENTRE entries %time %alloc %time %alloc
MAIN 0 0.0 0.0 100.0 100.0
CAF 0 0.0 0.0 100.0 99.9
main 1 0.0 0.1 100.0 99.9
f' 1 0.0 0.8 100.0 99.8
cycleLength 999 0.2 0.5 100.0 98.6
remainders 95845 98.3 93.0 99.8 98.2
remainders.r' 95817 1.5 5.2 1.5 5.2
maximumBy 1 0.0 0.1 0.0 0.4
maximumBy.maxBy 998 0.0 0.2 0.0 0.2
Looking at the profile here, it seems that your version goes through a lot more allocations (around 10-12 times as many,) but doesn't use a lot more RAM overall. So we need to explain about a cumulative factor of 5 or 6 in terms of allocations.
Remainders is recursive. In your example, it gets called 581294 times. In the wiki example, it gets called 95817 times. There's our 5-6 fold increase!
So I think the compare
call here is also a problem. Since it applies cycleLength
to both things we want to compare as well! In the wiki problem, cycleLength
gets applied to every number, but here, we have it applied to every number twice, and the comparison seems to be applied more often, and that is a problem especially with the bigger numbers, since remainders
has a bad complexity (it seems exponential, but I'm not sure.)
Since the maximum memory consumption of both programs isn't that dramatically different, I don't think this has anything to do with the heap.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With