I want to ask you if Haskell and C++ compilers can optimize function calls the same way. Please look at following codes. In the following example Haskell is significantly faster than C++.
I have heard that Haskell can compile to LLVM and can be optimized by the LLVM passes. Additionally I have heard that Haskell has some heavy optimizations under the hood. But the following examples should be able to work with the same performance. I want to ask:
(I'm using LLVM-3.2 and GHC-7.6).
C++ code:
#include <cstdio>
#include <cstdlib>
int b(const int x){
return x+5;
}
int c(const int x){
return b(x)+1;
}
int d(const int x){
return b(x)-1;
}
int a(const int x){
return c(x) + d(x);
}
int main(int argc, char* argv[]){
printf("Starting...\n");
long int iternum = atol(argv[1]);
long long int out = 0;
for(long int i=1; i<=iternum;i++){
out += a(iternum-i);
}
printf("%lld\n",out);
printf("Done.\n");
}
compiled with clang++ -O3 main.cpp
haskell code:
module Main where
import qualified Data.Vector as V
import System.Environment
b :: Int -> Int
b x = x + 5
c x = b x + 1
d x = b x - 1
a x = c x + d x
main = do
putStrLn "Starting..."
args <- getArgs
let iternum = read (head args) :: Int in do
putStrLn $ show $ V.foldl' (+) 0 $ V.map (\i -> a (iternum-i))
$ V.enumFromTo 1 iternum
putStrLn "Done."
compiled with ghc -O3 --make -fforce-recomp -fllvm ghc-test.hs
speed results:
Running testcase for program 'cpp/a.out'
-------------------
cpp/a.out 100000000 0.0% avg time: 105.05 ms
cpp/a.out 200000000 11.11% avg time: 207.49 ms
cpp/a.out 300000000 22.22% avg time: 309.22 ms
cpp/a.out 400000000 33.33% avg time: 411.7 ms
cpp/a.out 500000000 44.44% avg time: 514.07 ms
cpp/a.out 600000000 55.56% avg time: 616.7 ms
cpp/a.out 700000000 66.67% avg time: 718.69 ms
cpp/a.out 800000000 77.78% avg time: 821.32 ms
cpp/a.out 900000000 88.89% avg time: 923.18 ms
cpp/a.out 1000000000 100.0% avg time: 1025.43 ms
Running testcase for program 'hs/main'
-------------------
hs/main 100000000 0.0% avg time: 70.97 ms (diff: 34.08)
hs/main 200000000 11.11% avg time: 138.95 ms (diff: 68.54)
hs/main 300000000 22.22% avg time: 206.3 ms (diff: 102.92)
hs/main 400000000 33.33% avg time: 274.31 ms (diff: 137.39)
hs/main 500000000 44.44% avg time: 342.34 ms (diff: 171.73)
hs/main 600000000 55.56% avg time: 410.65 ms (diff: 206.05)
hs/main 700000000 66.67% avg time: 478.25 ms (diff: 240.44)
hs/main 800000000 77.78% avg time: 546.39 ms (diff: 274.93)
hs/main 900000000 88.89% avg time: 614.12 ms (diff: 309.06)
hs/main 1000000000 100.0% avg time: 682.32 ms (diff: 343.11)
EDIT Of course we cannot compare speed of languages, but the speed of implementiations.
But I'm curious if Ghc and C++ compilers can optimize function calls the same way
I've edited the question with new benchmark and codes based on your help :)
If your goal is to get this running as quickly as your C++ compiler, then you would want to use a data structure that the compiler can have its way with.
module Main where
import qualified Data.Vector as V
b :: Int -> Int
b x = x + 5
c x = b x + 1
d x = b x - 1
a x = c x + d x
main = do
putStrLn "Starting..."
putStrLn $ show $ V.foldl' (+) 0 $ V.map a $ V.enumFromTo 1 100000000
putStrLn "Done."
GHC is able to completely eliminate the loop and just inserts a constant into the resulting assembly. On my computer, this now has a runtime of < 0.002s, when using the same optimization flags as you originally specified.
As a follow up based on the comments by @Yuras, the core produced by the vector based solution and the stream-fusion solution are functionally identical.
main_$s$wfoldlM'_loop [Occ=LoopBreaker]
:: Int# -> Int# -> Int#
main_$s$wfoldlM'_loop =
\ (sc_s2hW :: Int#) (sc1_s2hX :: Int#) ->
case <=# sc1_s2hX 100000000 of _ {
False -> sc_s2hW;
True ->
main_$s$wfoldlM'_loop
(+#
sc_s2hW
(+#
(+# (+# sc1_s2hX 5) 1)
(-# (+# sc1_s2hX 5) 1)))
(+# sc1_s2hX 1)
}
$wloop_foldl [Occ=LoopBreaker]
:: Int# -> Int# -> Int#
$wloop_foldl =
\ (ww_s1Rm :: Int#) (ww1_s1Rs :: Int#) ->
case ># ww1_s1Rs 100000000 of _ {
False ->
$wloop_foldl
(+#
ww_s1Rm
(+#
(+# (+# ww1_s1Rs 5) 1)
(-# (+# ww1_s1Rs 5) 1)))
(+# ww1_s1Rs 1);
True -> ww_s1Rm
}
The only real difference is the choice of comparison operation for the termination condition. Both versions compile to tight tail recursive loops that can be easily optimized by LLVM.
ghc doesn't fuse lists (avoiding success at all costs?)
Here is version that uses stream-fusion package:
module Main where
import Prelude hiding (map, foldl)
import Data.List.Stream
import Data.Stream (enumFromToInt, unstream)
import Text.Printf
import Control.Exception
import System.CPUTime
b :: Int -> Int
b x = x + 5
c x = b x + 1
d x = b x - 1
a x = c x + d x
main = do
putStrLn "Starting..."
putStrLn $ show $ foldl (+) 0 $ map (\z -> a z) $ unstream $ enumFromToInt 1 100000000
putStrLn "Done."
I don't have llvm installed to compare with your results, but it is 10x faster then your version (compiled without llvm).
I think vector fusion should perform even faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With