Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple π(x) in Haskell vs C++

I'm learning Haskell. My interest is to use it for personal computer experimentation. Right now, I'm trying to see how fast Haskell can get. Many claim parity with C(++), and if that is true, I would be very happy (I should note that I will be using Haskell whether or not it's fast, but fast is still a good thing).

My test program implements π(x) with a very simple algorithm: Primes numbers add 1 to the result. Prime numbers have no integer divisors between 1 and √x. This is not an algorithm battle, this is purely for compiler performance.

Haskell seems to be about 6x slower on my computer, which is fine (still 100x faster than pure Python), but that could be just because I'm a Haskell newbie.

Now, my question: How, without changing the algorithm, can I optimize the Haskell implementation? Is Haskell really on performance parity with C?

Here is my Haskell code:

import System.Environment

-- a simple integer square root
isqrt :: Int -> Int
isqrt = floor . sqrt . fromIntegral

-- primality test        
prime :: Int -> Bool
prime x = null [x | q <- [3, 5..isqrt x], rem x q == 0]

main = do
  n <- fmap (read . head) getArgs
  print $ length $ filter prime (2:[3, 5..n])

Here is my C++ code:

#include <iostream>
#include <cmath>
#include <cstdlib>
using namespace std;

bool isPrime(int);

int main(int argc, char* argv[]) {
    int primes = 10000, count = 0;
    if (argc > 1) {
        primes = atoi(argv[1]);
    }
    if (isPrime(2)) {
        count++;
    }
    for (int i = 3; i <= primes; i+=2) {
        if (isPrime(i)){
            count++;
        }
    }
    cout << count << endl;
    return 0;
}

bool isPrime(int x){
    for (int i = 2; i <= floor(sqrt(x)); i++) {
        if (x % i == 0) {
            return false;
        }
    }
    return true;
}
like image 601
PythonNut Avatar asked Feb 19 '14 22:02

PythonNut


2 Answers

Your Haskell version is constructing a lazy list in prime only to test if it is null. This seems to indeed be a bottle neck. The following version runs just as fast as the C++ version on my machine:

prime :: Int -> Bool
prime x = go 3
  where
    go q | q <= isqrt x = if rem x q == 0 then False else go (q+2)
    go _  = True

3.31s when compiled with -O2 vs. 3.18s for C++ with gcc 4.8 and -O3 for n=5000000.

Of course, 'guessing' where the program is slow to optimize it is not a very good approach. Fortunately, Haskell has good profiling tools on board.

Compiling and running with

$ ghc --make primes.hs -O2 -prof -auto-all -fforce-recomp && ./primes 5000000 +RTS -p

gives

# primes.prof
  Thu Feb 20 00:49 2014 Time and Allocation Profiling Report  (Final)

     primes +RTS -p -RTS 5000000

  total time  =        5.71 secs   (5710 ticks @ 1000 us, 1 processor)
  total alloc = 259,580,976 bytes  (excludes profiling overheads)

COST CENTRE MODULE  %time %alloc

prime.go    Main     96.4    0.0
main        Main      2.0   84.6
isqrt       Main      0.9   15.4

                                                      individual     inherited
COST CENTRE MODULE                  no.     entries  %time %alloc   %time %alloc

MAIN        MAIN                     45           0    0.0    0.0   100.0  100.0
 main       Main                     91           0    2.0   84.6   100.0  100.0
  prime     Main                     92     2500000    0.7    0.0    98.0   15.4
   prime.go Main                     93   326103491   96.4    0.0    97.3   15.4
    isqrt   Main                     95           0    0.9   15.4     0.9   15.4

--- >8 ---

which clearly shows that prime is where things get hot. For more information on profiling, I'll refer you to Real World Haskell, Chap 25.

To really understand what is going on, you can look at (one of) GHC's intermediate languages Core, which will show you how the code looks like after optimization. Some good info is at the Haskell wiki. I would not recommend to do that unless necessary, but it is good to know that the possibility exists.

To your other questions:

1) How, without changing the algorithm, can I optimize the Haskell implementation?

Profile, and try to write inner loops so that they don't do any memory allocations and can be made strict by the compiler. Doing so can take some practice and experience.

2) Is Haskell really on performance parity with C?

That depends. GHC is amazing and can often optimize your program very well. If you know what you're doing you can usually get close to the performance of optimized C (100% - 200% of C's speed). That said, these optimizations are not always easy or pretty to the eye and high level Haskell can be slower. But don't forget that you're gaining amazing expressiveness and high level abstractions when using Haskell. It will usually be fast enough for all but the most performance critical applications and even then you can often get pretty close to C with some profiling and performance optimizations.

like image 102
Paul Avatar answered Nov 08 '22 12:11

Paul


I dont think that the Haskell version (original and improved by first answer) are equivalent with the C++ version. The reason is this: Both only consider every second element (in the prime function), while the C++ version scans every element (only i++ in the isPrime() function.

When i fix this (change i++ to i+=2 in the isPrime() function for C++) i get down to almost 1/3 of the runtime of the optimized Haskell version (2.1s C++ vs 6s Haskell).

The output remains the same for both (of course). Note that this is no specific opimization of the C++ version ,just an adaptation of the trick already applied in the Haskell version.

like image 2
Lazarus535 Avatar answered Nov 08 '22 12:11

Lazarus535