I'm learning Haskell. My interest is to use it for personal computer experimentation. Right now, I'm trying to see how fast Haskell can get. Many claim parity with C(++), and if that is true, I would be very happy (I should note that I will be using Haskell whether or not it's fast, but fast is still a good thing).
My test program implements π(x) with a very simple algorithm: Primes numbers add 1 to the result. Prime numbers have no integer divisors between 1 and √x. This is not an algorithm battle, this is purely for compiler performance.
Haskell seems to be about 6x slower on my computer, which is fine (still 100x faster than pure Python), but that could be just because I'm a Haskell newbie.
Now, my question: How, without changing the algorithm, can I optimize the Haskell implementation? Is Haskell really on performance parity with C?
Here is my Haskell
code:
import System.Environment
-- a simple integer square root
isqrt :: Int -> Int
isqrt = floor . sqrt . fromIntegral
-- primality test
prime :: Int -> Bool
prime x = null [x | q <- [3, 5..isqrt x], rem x q == 0]
main = do
n <- fmap (read . head) getArgs
print $ length $ filter prime (2:[3, 5..n])
Here is my C++
code:
#include <iostream>
#include <cmath>
#include <cstdlib>
using namespace std;
bool isPrime(int);
int main(int argc, char* argv[]) {
int primes = 10000, count = 0;
if (argc > 1) {
primes = atoi(argv[1]);
}
if (isPrime(2)) {
count++;
}
for (int i = 3; i <= primes; i+=2) {
if (isPrime(i)){
count++;
}
}
cout << count << endl;
return 0;
}
bool isPrime(int x){
for (int i = 2; i <= floor(sqrt(x)); i++) {
if (x % i == 0) {
return false;
}
}
return true;
}
Your Haskell version is constructing a lazy list in prime
only to test if it is null. This seems to indeed be a bottle neck. The following version runs just as fast as the C++ version on my machine:
prime :: Int -> Bool
prime x = go 3
where
go q | q <= isqrt x = if rem x q == 0 then False else go (q+2)
go _ = True
3.31s when compiled with -O2 vs. 3.18s for C++ with gcc 4.8 and -O3 for n=5000000.
Of course, 'guessing' where the program is slow to optimize it is not a very good approach. Fortunately, Haskell has good profiling tools on board.
Compiling and running with
$ ghc --make primes.hs -O2 -prof -auto-all -fforce-recomp && ./primes 5000000 +RTS -p
gives
# primes.prof
Thu Feb 20 00:49 2014 Time and Allocation Profiling Report (Final)
primes +RTS -p -RTS 5000000
total time = 5.71 secs (5710 ticks @ 1000 us, 1 processor)
total alloc = 259,580,976 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
prime.go Main 96.4 0.0
main Main 2.0 84.6
isqrt Main 0.9 15.4
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 45 0 0.0 0.0 100.0 100.0
main Main 91 0 2.0 84.6 100.0 100.0
prime Main 92 2500000 0.7 0.0 98.0 15.4
prime.go Main 93 326103491 96.4 0.0 97.3 15.4
isqrt Main 95 0 0.9 15.4 0.9 15.4
--- >8 ---
which clearly shows that prime
is where things get hot. For more information on profiling, I'll refer you to Real World Haskell, Chap 25.
To really understand what is going on, you can look at (one of) GHC's intermediate languages Core, which will show you how the code looks like after optimization. Some good info is at the Haskell wiki. I would not recommend to do that unless necessary, but it is good to know that the possibility exists.
To your other questions:
1) How, without changing the algorithm, can I optimize the Haskell implementation?
Profile, and try to write inner loops so that they don't do any memory allocations and can be made strict by the compiler. Doing so can take some practice and experience.
2) Is Haskell really on performance parity with C?
That depends. GHC is amazing and can often optimize your program very well. If you know what you're doing you can usually get close to the performance of optimized C (100% - 200% of C's speed). That said, these optimizations are not always easy or pretty to the eye and high level Haskell can be slower. But don't forget that you're gaining amazing expressiveness and high level abstractions when using Haskell. It will usually be fast enough for all but the most performance critical applications and even then you can often get pretty close to C with some profiling and performance optimizations.
I dont think that the Haskell version (original and improved by first answer) are equivalent with the C++ version. The reason is this: Both only consider every second element (in the prime function), while the C++ version scans every element (only i++ in the isPrime() function.
When i fix this (change i++ to i+=2 in the isPrime() function for C++) i get down to almost 1/3 of the runtime of the optimized Haskell version (2.1s C++ vs 6s Haskell).
The output remains the same for both (of course). Note that this is no specific opimization of the C++ version ,just an adaptation of the trick already applied in the Haskell version.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With