This pertains to Emily's answer here: https://stackoverflow.com/a/13850560/2026752
ansMap :: M.Map Integer Int
ansMap = M.fromAscList [(i, collatz i) | i <- [1..1000000]]
where collatz 1 = 0
collatz x = if x' <= 1000000 then 1 + ansMap M.! x'
else 1 + collatz x'
where x' = if even x then x `div` 2 else x*3 + 1
-- this code is really fast
fst $ maximumBy (comparing snd) $ M.toList ansMap
This seemed like a reasonable strategy, so I decided to take 1000000 and feed it as a variable to the function, so I could compute the Collatz sequence for even more numbers:
ansMap :: Integer -> M.Map Integer Int
ansMap n = M.fromAscList [(i, collatz i) | i <- [1..n]]
where collatz 1 = 0
collatz x = if x' <= n then 1 + ansMap n M.! x'
else 1 + collatz x'
where x' = if even x then x `div` 2 else x*3 + 1
-- but then suddenly this is slow
fst $ maximumBy (comparing snd) $ M.toList ansMap 1000000
This confuses me, since all I did was take n out and pass it back in! I don't know much about the Haskell runtime. Please help me understand! Thank you in advance.
In Emily's answer, ansMap is a M.Map Integer Int which is recursively defined in terms of itself. In your modified code, ansMap is a function, and ansMap n returns a M.Map Integer Int that is recursively defined in terms of other calls to ansMap. But those recursive calls to ansMap themselves construct and return a (distinct) M.Map Integer Int, which ends up making tons of different M.Map Integer Ints until the recursion bottoms out.
You can fix this simply by making it so that there is once again a M.Map Integer Int that is recursively defined in terms of itself, rather than having a recursively defined function.
ansMap :: Integer -> M.Map Integer Int
ansMap n = mapping
where mapping = M.fromAscList [(i, collatz i) | i <- [1..n]]
collatz 1 = 0
collatz x = if x' <= n then 1 + mapping M.! x'
else 1 + collatz x'
where x' = if even x then x `div` 2 else x*3 + 1
Note that collatz uses mapping, and does not make a recursive call to ansMap -- so ansMap is not recursively defined (though mapping is).
That was an asymptotic speedup; there are also constant-factor speedups available. For example, switching Integer to Int, switching Map to Vector, and doing bit-twiddly operations in the definition of x' makes it about an order of magnitude faster. If switching to Int makes you nervous, staying with Integer is only about 50% slower.
ansMap :: Int -> V.Vector Int
ansMap n = mapping
where mapping = V.generate (n+1) collatz
collatz 0 = -1
collatz 1 = 0
collatz x = if x' <= n then 1 + mapping V.! x'
else 1 + collatz x'
where x' = if x .&. 1 == 0 then shiftR x 1 else x*3 + 1
V.maxIndex $ ansMap 1000000
This seems to be a question about sharing, as is covered in Chapter 27 of "Haskell Programming from First Principles".
Being a function with explicit, named arguments also prevents sharing. Haskell is not fully lazy; it is merely non-strict, so it is not required to remember the result of every function application for a given set of arguments, nor would it be desirable given memory constraints. (pp. 1053-1054)
The book recommends using Debug.Trace to identify what is or is not being shared.
If you try these:
import Debug.Trace
ansMap = (trace "eval" M.fromAscList [(i, collatz i) | i <- [1..1000000]])
ansMap n = (trace "eval" M.fromAscList [(i, collatz i) | i <- [1..n]])
you will notice that in the first version 'ansMap' is only evaluated once, but in the case of the function with the named argument, it is evaluated many times. In the first case, the map is able to be shared, but in the second the named argument prevents sharing.
(At least this is how I understand it.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With