I'm currently working on project euler problem 14.
I solved it using a poorly coded program, without memoization, that took 386 5 seconds to run (see edit).
Here it is:
step :: (Integer, Int) -> Integer -> (Integer, Int)
step (i, m) n | nextValue > m = (n, nextValue)
| otherwise = (i, m)
where nextValue = syr n 1
syr :: Integer -> Int -> Int
syr 1 acc = acc
syr x acc | even x = syr (x `div` 2) (acc + 1)
| otherwise = syr (3 * x + 1) (acc + 1)
p14 = foldl step (0, 0) [500000..999999]
My question is about several comments in the thread related to this problem, where were mentionned execution times of <1 s for programs as follow (C code, credits to the project euler forum user ix for the code -- note: I didn't check that the execution time is in fact as mentionned):
#include <stdio.h>
int main(int argc, char **argv) {
int longest = 0;
int terms = 0;
int i;
unsigned long j;
for (i = 1; i <= 1000000; i++) {
j = i;
int this_terms = 1;
while (j != 1) {
this_terms++;
if (this_terms > terms) {
terms = this_terms;
longest = i;
}
if (j % 2 == 0) {
j = j / 2;
} else {
j = 3 * j + 1;
}
}
}
printf("longest: %d (%d)\n", longest, terms);
return 0;
}
To me, those programs are kind of the same, when talking about the algorithm.
So I wonder why there is such a big difference? Or is there any fondamental difference between our two algorithms that can justify a x6 factor in performance?
BTW, I'm currently trying to implement this algorithm with memoization, but am kind of lost as to me, it's way easier to implement in an imperative language (and I don't manipulate monads yet so I can't use this paradigm). So if you have any good tutorial that fits a beginner to learn memoization, I'll be glad (the ones I encountered were not detailed enough or out of my league).
Note: I came to declarative paradigm through Prolog and am still in the very early process of discovering Haskell, so I might miss important things.
Note2: any general advice about my code is welcome.
EDIT: thanks to delnan's help, I compiled the program and it now runs in 5 seconds, so I mainly look for hints on memoization now (even if ideas about the existing x6 gap are still welcome).
After having compiled it with optimisations, there are still several differences to the C programme
div
, while the C programme uses machine division (which truncates) [but any self-respecting C compiler transforms that into a shift, so that makes it yet faster], that would be quot
in Haskell; that reduced the run time by some 15% here.Integer
s. If you have 64-bit Int
s in your GHC (64-bit OS other than Windows), replace Integer
with Int
. That reduced the run time by a factor of about 3 here. If you're on a 32-bit system, you're out of luck, GHC doesn't use native 64-bit instructions there, these operations are implemented as C calls, that's still rather slow.For the memoisation, you can outsource it to one of the memoisation packages on hackage, the only one that I remember is data-memocombinators, but there are others. Or you can do it yourself, for example keeping a map of previously calculated values - that would work best in the State
monad,
import Control.Monad.State.Strict
import qualified Data.Map as Map
import Data.Map (Map, singleton)
type Memo = Map Integer Int
syr :: Integer -> State Memo Int
syr n = do
mb <- gets (Map.lookup n)
case mb of
Just l -> return l
Nothing -> do
let m = if even n then n `quot` 2 else 3*n+1
l <- syr m
let l' = l+1
modify (Map.insert n l')
return l'
solve :: Integer -> Int -> Integer -> State Memo (Integer,Int)
solve maxi len start
| len > 1000000 = return (maxi,len)
| otherwise = do
l <- syr start
if len < l
then solve start l (start+1)
else solve maxi len (start+1)
p14 :: (Integer,Int)
p14 = evalState (solve 0 0 500000) (singleton 1 1)
but that will probably not gain too much (not even when you've added the necessary strictness). The trouble is that a lookup in a Map
is not too cheap and an insertion is relatively expensive.
Another method is to keep a mutable array for the lookup. The code becomes more complicated, since you have to choose a reasonable upper bound for the values to cache (should be not much larger than the bound for the starting values) and deal with the parts of the sequences falling outside the memoised range. But an array lookup and write are fast. If you have 64-bit Int
s, the below code runs pretty fast, here it takes 0.03s for a limit of 1 million, and 0.33s for a limit of 10 million, the corresponding (as closely as I reasonably could) C code runs in 0.018 resp. 0.2s.
module Main (main) where
import System.Environment (getArgs)
import Data.Array.ST
import Data.Array.Base
import Control.Monad.ST
import Data.Bits
import Data.Int
main :: IO ()
main = do
args <- getArgs
let bd = case args of
a:_ -> read a
_ -> 100000
print $ collMax bd
next :: Int -> Int
next n
| n .&. 1 == 0 = n `unsafeShiftR` 1
| otherwise = 3*n + 1
collMax :: Int -> (Int,Int16)
collMax upper = runST $ do
arr <- newArray (0,upper) 0 :: ST s (STUArray s Int Int16)
let go l m
| upper < m = go (l+1) $ next m
| otherwise = do
l' <- unsafeRead arr m
case l' of
0 -> do
l'' <- go 1 $ next m
unsafeWrite arr m (l'' + 1)
return (l+l'')
_ -> return (l+l'-1)
collect mi ml i
| upper < i = return (mi, ml)
| otherwise = do
l <- go 1 i
if l > ml
then collect i l (i+1)
else collect mi ml (i+1)
unsafeWrite arr 1 1
collect 1 1 2
Well, the C program uses unsigned long
, but Integer
can store arbitrarily large integers (it's a bignum). If you import Data.Word
, then you can use Word
, which is a machine-word-sized unsigned integer.
After replacing Integer
with Word
, and using ghc -O2
and gcc -O3
, the C program runs in 0.72 seconds, while the Haskell programs runs in 1.92 seconds. 2.6x isn't bad. However, ghc -O2
doesn't always help, and this is one of the programs on which it doesn't! Using just -O
, as you did, brings the runtime down to 1.90 seconds.
I tried replacing div
with quot
(which uses the same type of division as C; they only differ on negative inputs), but strangely it actually made the Haskell program run slightly slower for me.
You should be able to speed up the syr
function with the help of this previous Stack Overflow question I answered about the same Project Euler problem.
On my current system (32-bit Core2Duo) your Haskell code, including all the optimizations given in the answers, takes 0.8s
to compile and 1.2s
to run.
You could transfer the run-time to compile-time, and reduce the run-time to nearly zero.
module Euler14 where
import Data.Word
import Language.Haskell.TH
terms :: Word -> Word
terms n = countTerms n 0
where
countTerms 1 acc = acc + 1
countTerms n acc | even n = countTerms (n `div` 2) (acc + 1)
| otherwise = countTerms (3 * n + 1) (acc + 1)
longestT :: Word -> Word -> (Word, Word)
longestT mi mx = find mi mx (0, 0)
where
find mi mx (ct,cn) | mi == mx = if ct > terms mi then (ct,cn) else (terms mi, mi)
| otherwise = find (mi + 1) mx
(if ct > terms mi then (ct,cn) else (terms mi, mi))
longest :: Word -> Word -> ExpQ
longest mi mx = return $ TupE [LitE (IntegerL (fromIntegral a)),
LitE (IntegerL (fromIntegral b))]
where
(a,b) = longestT mi mx
and
{-# LANGUAGE TemplateHaskell #-}
import Euler14
main = print $(longest 500000 999999)
On my system it takes 2.3s
to compile this but the run-time goes down to 0.003s
. Compile Time Function Execution (CTFE) is something you can't do in C/C++. The only other programming language that I know of that supports CTFE is the D programming language. And just to be complete, the C code takes 0.1s
to compile and 0.7s
to run.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With