I wrote a piece of Haskell code to calculate the length of a Collatz chain. Given a number n, the next number in the sequence is n/2 if n is even or 3*n+1 if n is odd. The sequence ends when it has converged to 1. I wanted to find the length of the longest chain when starting from any number below some input number.
I tried implementing the length calculation with a memoized function since I expected to need the length of chains starting from some numbers. Thus, the length of the chain starting from 726 would be just 1 + the length of the chain starting from 363, which would already have been calculated. My code is shown below.
collatz :: Int -> Int
collatz n
| even n = n `div` 2
| otherwise = 3 * n + 1
collatzLength :: Int -> Int
collatzLength = (fmap len [0 ..] !!)
where len 0 = 0
len 1 = 1
len n = 1 + (collatzLength . collatz $ n)
maxLengthBelow :: Int -> Int
maxLengthBelow = foldl1 max . fmap collatzLength . enumFromTo 1
main :: IO()
main = print $ maxLengthBelow 10000
This code works, but takes a huge amount of memory. When profiling it, running main
with the input of 10000, len
is only called 21664 times, as expected, but the program takes 16 seconds and 4.5Gb of memory! What is taking up all that memory? I would have expected the memoized function to produce a fast, low-memory solution.
Memoization is an optimization technique used to speed up a function by caching its previously computed results. In impure programming languages, a mutable map is used to cache computed results.
Most Haskell compilers and interpreters (all of them that I know of actually) do not memoize polymorphic structures, so m2's internal list is recreated every time it's called, where m1's is not.
In programming, memoization is an optimization technique that makes applications more efficient and hence faster. It does this by storing computation results in cache, and retrieving that same information from the cache the next time it's needed instead of computing it again.
Is memoization same as caching? Yes, kind of. Memoization is actually a specific type of caching. While caching can refer in general to any storing technique (like HTTP caching) for future use, memoizing specifically involves caching the return values of a function .
One of the things that makes the Collatz sequence so fun is that there are some small starting seeds that take you way, way out into the atmosphere on their way to 1. In particular, 9663 makes it all the way out to 27114424 before it collapses -- and that's a long memoization list to build!
And, for what it's worth, I expect your memoization list to use three machine words per element: one for the I#
constructor on the Int
, one for the contained number, and one for the (:)
constructor. Let's ask how much space it would take to store 27114424 elements, then:
> 27114424 * (64*3) / 1024 {-Kb-} / 1024 {-Mb-} / 1024 {-Gb-}
4.8484368324279785
So 4.5Gb sounds about right, perhaps even a little low.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With