I'm trying to resolve problem 14 of Project Euler (http://projecteuler.net/problem=14) and I hit a dead end using Haskell.
Now, I know that the numbers may be small enough and I could do a brute force, but that isn't the purpose of my exercise.
I am trying to memorize the intermediate results in a Map
of type Map Integer (Bool, Integer)
with the meaning of:
- the first Integer (the key) holds the number
- the Tuple (Bool, Interger) holds either (True, Length) or (False, Number)
where Length = length of the chain
Number = the number before him
Ex:
for 13: the chain is 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
My map should contain :
13 - (True, 10)
40 - (False, 13)
20 - (False, 40)
10 - (False, 20)
5 - (False, 10)
16 - (False, 5)
8 - (False, 16)
4 - (False, 8)
2 - (False, 4)
1 - (False, 2)
Now when I search for another number like 40
i know that the chain has (10 - 1) length
and so on.
I want now, if I search for 10, not only to tell me that length of 10 is (10 - 3) length
and update the map, but also I want to update 20, 40 in case they are still (False, _)
My code:
import Data.Map as Map
solve :: [Integer] -> Map Integer (Bool, Integer)
solve xs = solve' xs Map.empty
where
solve' :: [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
solve' [] table = table
solve' (x:xs) table =
case Map.lookup x table of
Nothing -> countF x 1 (x:xs) table
Just (b, _) ->
case b of
True -> solve' xs table
False -> {-WRONG-} solve' xs table
f :: Integer -> Integer
f x
| x `mod` 2 == 0 = x `quot` 2
| otherwise = 3 * x + 1
countF :: Integer -> Integer -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
countF n cnt (x:xs) table
| n == 1 = solve' xs (Map.insert x (True, cnt) table)
| otherwise = countF (f n) (cnt + 1) (x:xs) $ checkMap (f n) n table
checkMap :: Integer -> Integer -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
checkMap n rez table =
case Map.lookup n table of
Nothing -> Map.insert n (False, rez) table
Just _ -> table
At the {-WRONG-} part we should update all the values like in the following example:
--We are looking for 10:
10 - (False, 20)
|
V {-finally-} update 10 => (True, 10 - 1 - 1 - 1)
20 - (False, 40) ^
| |
V update 20 => 20 - (True, 10 - 1 - 1)
40 - (False, 13) ^
| |
V update 40 => 40 - (True, 10 - 1)
13 - (True, 10) ^
| |
---------------------------
The problem is that I don't know if its possible to do 2 things in a function like updating a number and continue the recurence. In a C
like language I may do something like (pseudocode):
void f(int n, tuple(b,nr), int &length, table)
{
if(b == False) f (nr, (table lookup nr), 0, table);
// the bool is true so we got a length
else
{
length = nr;
return;
}
// Since this is a recurence it would work as a stack, producing the right output
table update(n, --cnt);
}
The last instruction would work since we are sending cnt by reference. Also we always know that it will finish at some point and cnt should not be < 1.
The easiest optimization (as you have identified) is memoization. You have attempted create a memoization system yourself, however have come across issues on how to store the memoized values. There are solutions to doing this in a maintainable way, such as using a State monad or a STArray. However, there is a much simpler solution to your problem - use haskell's existing memoization. Haskell by default remembers constant values, so if you create a value that stores the collatz values, it will be automatically memoized!
A simple example of this is the following fibonacci definition:
fib :: Int -> Integer
fib n = fibValues !! n where
fibValues = 1 : 1 : zipWith (+) fibValues (tail fibValues)
The fibValues
is a [Integer]
, and as it is just a constant value, it is memoized. However, that doesn't mean it is all memoized at once, since as it is an infinte list, this would never finish. Instead, the values are only calculated when needed, as haskell is lazy.
So if you do something similar with your problem, you will get memoization without a lot of the work. However, using a list like above won't work well in your solution. This is because the collatz algorithm uses many different values to get the result for a given number, so the container used will require random access to be efficient. The obvious choice is an array.
collatzMemoized :: Array Integer Int
Next, we need to fill up the array with the correct values. I'll write this function pretending a collatz
function exists that calculates the collatz value for any n. Also, note that arrays are fixed size, so a value needs to be used to determine the maximum number to memoize. I'll use a million, but any value can be used (it is a memory/speed tradeoff).
collatzMemoized = listArray (1, maxNumberToMemoize) $ map collatz [1..maxNumberToMemoize] where
maxNumberToMemroize = 1000000
That is pretty straightforward, the listArray
is given bounds, and the a list of all the collatz values in that range is given to it. Remember that this won't calculate all the collatz values straight away, as the values are lazy.
Now, the collatz function can be written. The most important part is to only check the collatzMemoized
array if the number being checked is within its bounds:
collatz :: Integer -> Int
collatz 1 = 1
collatz n
| inRange (bounds collatzMemoized) nextValue = 1 + collatzMemoized ! nextValue
| otherwise = 1 + collatz nextValue
where
nextValue = case n of
1 -> 1
n | even n -> n `div` 2
| otherwise -> 3 * n + 1
In ghci, you can now see the effectiveness of the memoization. Try collatz 200000
. It will take about 2 seconds to finish. However, if you run it again, it will complete instantly.
Finally, the solution can be found:
maxCollatzUpTo :: Integer -> (Integer, Int)
maxCollatzUpTo n = maximumBy (compare `on` snd) $ zip [1..n] (map collatz [1..n]) where
and then printed:
main = print $ maxCollatzUpTo 1000000
If you run main, the result will be printed in about 10 seconds.
Now, a small problem with this approach is it uses a lot of stack space. It will work fine in ghci (which seems to use be more flexible with regards to stack space). However, if you compile it and try to run the executable, it will crash (with a stack space overflow). So to run the program, you have to specify more when you compile it. This can be done by adding -with-rtsopts='K64m'
to the compile options. This increases the stack to 64mb.
Now the program can be compiled and ran:
> ghc -O3 --make -with-rtsopts='-K6m' problem.hs
Running ./problem
will give the result in less than a second.
You are going about memoization the hard way, trying to write an imperative program in Haskell. Borrowing from David Eisenstat's solution, we'll solve it as j_random_hacker suggested:
collatzLength :: Integer -> Integer
collatzLength n
| n == 1 = 1
| even n = 1 + collatzLength (n `div` 2)
| otherwise = 1 + collatzLength (3*n + 1)
The dynamic programming solution for this is to replace the recursion with looking things up in a table. Let's make a function where we can replace the recursive call:
collatzLengthDef :: (Integer -> Integer) -> Integer -> Integer
collatzLengthDef r n
| n == 1 = 1
| even n = 1 + r (n `div` 2)
| otherwise = 1 + r (3*n + 1)
Now we could define the recursive algorithm as
collatzLength :: Integer -> Integer
collatzLength = collatzLengthDef collatzLength
Now we could also make a tabled version of this (it takes a number for the table size, and returns a collatzLength function that is calculated using a table of that size):
-- A utility function that makes memoizing things easier
buildTable :: (Ix i) => (i, i) -> (i -> e) -> Array i e
buildTable bounds f = array $ map (\x -> (x, f x)) $ range bounds
collatzLengthTabled :: Integer -> Integer -> Integer
collatzLengthTabled n = collatzLengthTableLookup
where
bounds = (1, n)
table = buildTable bounds (collatzLengthDef collatzLengthTableLookup)
collatzLengthTableLookup =
\x -> Case inRange bounds x of
True -> table ! x
_ -> (collatzLengthDef collatzLengthTableLookup) x
This works by defining the collatzLength to be a table lookup, with the table being the definition of the function, but with recursive calls replaced by table lookup. The table lookup function checks to see if the argument to the function is in the range that is tabled, and falls back on the definition of the function. We can even make this work for tabling any function like this:
tableRange :: (Ix a) => (a, a) -> ((a -> b) -> a -> b) -> a -> b
tableRange bounds definition = tableLookup
where
table = buildTable bounds (definition tableLookup)
tableLookup =
\x -> Case inRange bounds x of
True -> table ! x
_ -> (definition tableLookup) x
collatzLengthTabled n = tableRange (1, n) collatzLengthDef
You just need to make sure that you
let memoized = collatzLengthTabled 10000000
... memoized ...
So that only one table is built in memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With