I'm working through the problems in Project Euler as a way of learning Haskell, and I find that my programs are a lot slower than a comparable C version, even when compiled. What can I do to speed up my Haskell programs?
For example, my brute-force solution to Problem 14 is:
import Data.Int import Data.Ord import Data.List searchTo = 1000000 nextNumber :: Int64 -> Int64 nextNumber n | even n = n `div` 2 | otherwise = 3 * n + 1 sequenceLength :: Int64 -> Int sequenceLength 1 = 1 sequenceLength n = 1 + (sequenceLength next) where next = nextNumber n longestSequence = maximumBy (comparing sequenceLength) [1..searchTo] main = putStrLn $ show $ longestSequence
Which takes around 220 seconds, while an "equivalent" brute-force C version only takes 1.2 seconds.
#include <stdio.h> int main(int argc, char **argv) { int longest = 0; int terms = 0; int i; unsigned long j; for (i = 1; i <= 1000000; i++) { j = i; int this_terms = 1; while (j != 1) { this_terms++; if (this_terms > terms) { terms = this_terms; longest = i; } if (j % 2 == 0) j = j / 2; else j = 3 * j + 1; } } printf("%d\n", longest); return 0; }
What am I doing wrong? Or am I naive to think that Haskell could even approach C's speed?
(I'm compiling the C version with gcc -O2, and the Haskell version with ghc --make -O).
Faster: producing a program that runs quicker. The key tool to use in making your Haskell program run faster are GHC's profiling facilities, described separately in Chapter 5, Profiling. There is no substitute for finding where your program's time/space is really going, as opposed to where you imagine it is going.
Haskell's laziness actually means that mutability doesn't matter as much as you think it would, plus it's high-level so there are many optimizations the compiler can apply. Thus, modifying a record in-place will rarely be slower than it would in a language such as C.
Over the past year or so he's done a few of the Project Euler problems in Haskell and Python, and he's generally found Haskell to be much faster.
For testing purpose I have just set searchTo = 100000
. The time taken is 7.34s. A few modification leads to some big improvement:
Use an Integer
instead of Int64
. This improves the time to 1.75s.
Use an accumulator (you don't need sequenceLength to be lazy right?) 1.54s.
seqLen2 :: Int -> Integer -> Int seqLen2 a 1 = a seqLen2 a n = seqLen2 (a+1) (nextNumber n) sequenceLength :: Integer -> Int sequenceLength = seqLen2 1
Rewrite the nextNumber
using quotRem
, thus avoiding computing the division twice (once in even
and once in div
). 1.27s.
nextNumber :: Integer -> Integer nextNumber n | r == 0 = q | otherwise = 6*q + 4 where (q,r) = quotRem n 2
Use Schwartzian transform instead of maximumBy
. The problem of maximumBy . comparing
is that the sequenceLength
function is called more than once for each value. 0.32s.
longestSequence = snd $ maximum [(sequenceLength a, a) | a <- [1..searchTo]]
Note:
ghc -O
and run with +RTS -s
)gcc -O3 -m32
.Although this is already rather old, let me chime in, there's one crucial point that hasn't been addressed before.
First, the timings of the different programmes on my box. Since I'm on a 64-bit linux system, they show somewhat different characteristics: using Integer
instead of Int64
does not improve performance as it would with a 32-bit GHC, where each Int64
operation would incur the cost of a C-call while the computations with Integer
s fitting in signed 32-bit integers don't need a foreign call (since only few operations exceed that range here, Integer
is the better choice on a 32-bit GHC).
Integer
instead of Int64
: 33.96 secondsInt
: 1.85 secondsInt
: 1.90 secondsquotRem
instead of divMod
: 1.79 secondsSo what have we?
div
resp. divMod
when it's not necessary, quot
resp. quotRem
are much fasterWhat is still missing?
if (j % 2 == 0) j = j / 2; else j = 3 * j + 1;
Any C compiler I have used transforms the test j % 2 == 0
into a bit-masking and doesn't use a division instruction. GHC does not (yet) do that. So testing even n
or computing n `quotRem` 2
is quite an expensive operation. Replacing nextNumber
in KennyTM's Integer
version with
nextNumber :: Integer -> Integer nextNumber n | fromInteger n .&. 1 == (0 :: Int) = n `quot` 2 | otherwise = 3*n+1
reduces its running time to 3.25 seconds (Note: for Integer
, n `quot` 2
is faster than n `shiftR` 1
, that takes 12.69 seconds!).
Doing the same in the Int
version reduces its running time to 0.41 seconds. For Int
s, the bit-shift for division by 2 is a bit faster than the quot
operation, reducing its running time to 0.39 seconds.
Eliminating the construction of the list (that doesn't appear in the C version either),
module Main (main) where import Data.Bits result :: Int result = findMax 0 0 1 findMax :: Int -> Int -> Int -> Int findMax start len can | can > 1000000 = start | canlen > len = findMax can canlen (can+1) | otherwise = findMax start len (can+1) where canlen = findLen 1 can findLen :: Int -> Int -> Int findLen l 1 = l findLen l n | n .&. 1 == 0 = findLen (l+1) (n `shiftR` 1) | otherwise = findLen (l+1) (3*n+1) main :: IO () main = print result
yields a further small speedup, resulting in a running time of 0.37 seconds.
So the Haskell version that's in close correspondence to the C version doesn't take that much longer, it's a factor of ~1.3.
Well, let's be fair, there's an inefficiency in the C version that's not present in the Haskell versions,
if (this_terms > terms) { terms = this_terms; longest = i; }
appearing in the inner loop. Lifting that out of the inner loop in the C version reduces its running time to 0.27 seconds, making the factor ~1.4.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With