Comparing speed of Haskell and C for the computation of primes

Question

I initially wrote this (brute force and inefficient) method of calculating primes with the intent of making sure that there was no difference in speed between using "if-then-else" versus guards in Haskell (and there is no difference!). But then I decided to write a C program to compare and I got the following (Haskell slower by just over 25%) :

(Note I got the ideas of using rem instead of mod and also the O3 option in the compiler invocation from the following post : On improving Haskell's performance compared to C in fibonacci micro-benchmark)

Haskell : Forum.hs

divisibleRec :: Int -> Int -> Bool
divisibleRec i j 
  | j == 1         = False 
  | i `rem` j == 0 = True 
  | otherwise      = divisibleRec i (j-1)

divisible::Int -> Bool
divisible i = divisibleRec i (i-1)

r = [ x | x <- [2..200000], divisible x == False]

main :: IO()
main = print(length(r))

C : main.cpp

#include <stdio.h>

bool divisibleRec(int i, int j){
  if(j==1){ return false; }
  else if(i%j==0){ return true; }
  else{ return divisibleRec(i,j-1); }
}

bool divisible(int i){ return divisibleRec(i, i-1); }

int main(void){
  int i, count =0;
  for(i=2; i<200000; ++i){
    if(divisible(i)==false){
      count = count+1;
    }
  }
  printf("number of primes = %d
",count);
  return 0;
}

The results I got were as follows :

Compilation times

time (ghc -O3 -o runProg Forum.hs)
real    0m0.355s
user    0m0.252s
sys 0m0.040s

time (gcc -O3 -o runProg main.cpp)
real    0m0.070s
user    0m0.036s
sys 0m0.008s

and the following running times :

Running times on Ubuntu 32 bit

Haskell
17984
real    0m54.498s
user    0m51.363s
sys 0m0.140s


C++
number of primes = 17984
real    0m41.739s
user    0m39.642s
sys 0m0.080s

I was quite impressed with the running times of Haskell. However my question is this : can I do anything to speed up the haskell program without :

Changing the underlying algorithm (it is clear that massive speedups can be gained by changing the algorithm; but I just want to understand what I can do on the language/compiler side to improve performance)
Invoking the llvm compiler (because I dont have this installed)

[EDIT : Memory usage]

After a comment by Alan I noticed that the C program uses a constant amount of memory where as the Haskell program slowly grows in memory size. At first I thought this had something to do with recursion, but gspr explains below why this is happening and provides a solution. Will Ness provides an alternative solution which (like gspr's solution) also ensures that the memory remains static.

[EDIT : Summary of bigger runs]

max number tested : 200,000:

(54.498s/41.739s) = Haskell 30.5% slower

max number tested : 400,000:

3m31.372s/2m45.076s = 211.37s/165s = Haskell 28.1% slower

max number tested : 800,000:

14m3.266s/11m6.024s = 843.27s/666.02s = Haskell 26.6% slower

[EDIT : Code for Alan]

This was the code that I had written earlier which does not have recursion and which I had tested on 200,000 :

#include <stdio.h>

bool divisibleRec(int i, int j){
  while(j>0){
    if(j==1){ return false; }
    else if(i%j==0){ return true; }
    else{ j -= 1;}
  }
}


bool divisible(int i){ return divisibleRec(i, i-1); }

int main(void){
  int i, count =0;
  for(i=2; i<8000000; ++i){
    if(divisible(i)==false){
      count = count+1;
    }
  }
  printf("number of primes = %d
",count);
  return 0;
}

The results for the C code with and without recursion are as follows (for 800,000) :

With recursion : 11m6.024s

Without recursion : 11m5.328s

Note that the executable seems to take up 60kb (as seen in System monitor) irrespective of the maximum number, and therefore I suspect that the compiler is detecting this recursion.

gspr · Accepted Answer

This isn't really answering your question, but rather what you asked in a comment regarding growing memory usage when the number 200000 grows.

When that number grows, so does the list r. Your code needs all of r at the very end, to compute its length. The C code, on the other hand, just increments a counter. You'll have to do something similar in Haskell too if you want constant memory usage. The code will still be very Haskelly, and in general it's a sensible proposition: you don't really need the list of numbers for which divisible is False, you just need to know how many there are.

You can try with

main :: IO ()
main = print $ foldl' (\s x -> if divisible x then s else s+1) 0 [2..200000]

(foldl' is a stricter foldl from Data.List that avoids thunks being built up).

Thomas M. DuBuisson · Answer

Well bang patters give you a very small win (as does llvm, but you seem to have expected that):

{-# LANUGAGE BangPatterns #-}

divisibleRec !i !j | j == 1         = False

And on my x86-64 I get a very big win by switching to smaller representations, such as Word32:

divisibleRec :: Word32 -> Word32 -> Bool
...
divisible :: Word32 -> Bool

My timings:

$ time ./so             -- Int
2262

real    0m2.332s

$ time ./so              -- Word32
2262

real    0m1.424s

This is a closer match to your C program, which is only using int. It still doesn't match performance wise, I suspect we'd have to look at core to figure out why.

EDIT: and the memory use, as was already noted I see, is about the named list r. I just inlined r, made it output a 1 for each non-divisble value and took the sum:

main = print $ sum $ [ 1 | x <- [2..800000], not (divisible x) ]

Will Ness · Answer

Another way to write down your algorithm is

main = print $ length [()|x<-[2..200000], and [rem x d>0|d<-[x-1,x-2..2]]]

Unfortunately, it runs slower. Using all ((>0).rem x) [x-1,x-2..2] as a test, it runs slower still. But maybe you'd test it on your setup nevertheless.

Replacing your code with explicit loop with bang patterns made no difference whatsoever:

{-# OPTIONS_GHC -XBangPatterns #-}
r4::Int->Int
r4 n = go 0 2 where
  go !c i | i>n = c
          | True = go (if not(divisible i) then (c+1) else c) (i+1)

divisibleRec::Int->Int->Bool
divisibleRec i !j | j == 1 = False 
                  | i `rem` j == 0 = True 
                  | otherwise = divisibleRec i (j-1)

Comparing speed of Haskell and C for the computation of primes

Tags:

c

optimization

math

haskell

ghc

artella

3 Answers

gspr

Thomas M. DuBuisson

Will Ness

Recent Activity

Donate For Us

Comparing speed of Haskell and C for the computation of primes

Tags:

c

optimization

math

haskell

ghc

artella

3 Answers

gspr

Thomas M. DuBuisson

Will Ness

Related questions

Recent Activity

Donate For Us