I am new to Haskell and I am puzzled by the cost of a function call, which seems to be completely unreasonable to me, and makes me think I am doing something fundamentally wrong.
Consider the following Haskell code:
module Main where
logistic x = 4.0*x*(1.0-x)
lg :: Double -> Int -> Double
lg !x 0 = x
lg !x !n = lg (logistic x) (n-1)
main = putStrLn $ show $ lg 0.7861 100000000
Compiling this with the command
ghc -O3 -XBangPatterns -o tsths tst.hs
and running it, I get:
real 0m15.904s
user 0m15.853s
sys 0m0.016s
If instead of calling the function logistic
I calculate the expression inline:
module Main where
lg :: Double -> Int -> Double
lg !x 0 = x
lg !x !n = lg (4.0*x*(1.0-x)) (n-1)
main = putStrLn $ show $ lg 0.7861 100000000
the execution time becomes:
real 0m0.838s
user 0m0.828s
sys 0m0.004s
which is exactly the same as the equivalent C program, which is
#include <stdio.h>
int main() {
int i, num=100000000;
double x=0.7861;
for (i=0; i<num; ++i)
x *= 4.0*(1.0-x);
printf("%lg\n", x);
}
Am I doing something terribly wrong?
Many thanks.
Long story short, the overhead of a direct (non-virtual) function call was approximately 5.5 nanoseconds, or 18 clock cycles, compared to an inline function call. The overhead of a virtual function call was 13.2 nanoseconds, or 42 clock cycles, compared to inline.
1) Function call overhead doesn't occur. 2) It also saves the overhead of push/pop variables on the stack when function is called. 3) It also saves overhead of a return call from a function. 4) When you inline a function, you may enable compiler to perform context specific optimization on the body of function.
Speaking from personal experience, I write code in a proprietary language that is fairly modern in terms of capability, but function calls are ridiculously expensive, to the point where even typical for loops have to be optimized for speed: for(Integer index = 0, size = someList.
Yes method calls slow down the code execution a tiny little bit, if they a not inlined by the c#-compiler or the jit-compiler. However, unless your code runs in a loop and is executed a million times or so, you should really focus on producing clean, understandable and maintainable code.
It's a bug in GHC-7.4.1. Looking at the generated core (only the core for the function lg
is important, from GHC-7.4.2, we get
Main.lg3 :: GHC.Types.Double
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
ConLike=False, Cheap=False, Expandable=False,
Guidance=IF_ARGS [] 30 0}]
Main.lg3 = GHC.Float.$w$cfromRational Main.lg4 Main.lg2
Main.lg1 :: GHC.Types.Double
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
ConLike=False, Cheap=False, Expandable=False,
Guidance=IF_ARGS [] 30 0}]
Main.lg1 = GHC.Float.$w$cfromRational Main.lg2 Main.lg2
Main.$wlg :: GHC.Prim.Double# -> GHC.Prim.Int# -> GHC.Prim.Double#
[GblId,
Arity=2,
Str=DmdType LL,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS [0 30] 158 0}]
Main.$wlg =
\ (ww_s1Oy :: GHC.Prim.Double#) (ww1_s1OC :: GHC.Prim.Int#) ->
case ww1_s1OC of ds_Xvs {
__DEFAULT ->
case Main.lg3 of _ { GHC.Types.D# x_awJ ->
case Main.lg1 of _ { GHC.Types.D# x1_awV ->
letrec {
$wlg1_X1PF [Occ=LoopBreaker]
:: GHC.Prim.Double# -> GHC.Prim.Int# -> GHC.Prim.Double#
[LclId, Arity=2, Str=DmdType LL]
$wlg1_X1PF =
\ (ww2_X1Pv :: GHC.Prim.Double#) (ww3_X1PA :: GHC.Prim.Int#) ->
case ww3_X1PA of ds1_Xwr {
__DEFAULT ->
$wlg1_X1PF
(GHC.Prim.*##
(GHC.Prim.*## x_awJ ww2_X1Pv) (GHC.Prim.-## x1_awV ww2_X1Pv))
(GHC.Prim.-# ds1_Xwr 1);
0 -> ww2_X1Pv
}; } in
$wlg1_X1PF
(GHC.Prim.*##
(GHC.Prim.*## x_awJ ww_s1Oy) (GHC.Prim.-## x1_awV ww_s1Oy))
(GHC.Prim.-# ds_Xvs 1)
}
};
0 -> ww_s1Oy
}
two top-level Double
s and a decent loop.
GHC-7.4.1 was a bit too inlining-happy, that produced
Rec {
Main.$wlg [Occ=LoopBreaker]
:: GHC.Prim.Double# -> GHC.Prim.Int# -> GHC.Prim.Double#
[GblId, Arity=2, Str=DmdType LL]
Main.$wlg =
\ (ww_s1NS :: GHC.Prim.Double#) (ww1_s1NW :: GHC.Prim.Int#) ->
case ww1_s1NW of ds_Xvb {
__DEFAULT ->
case GHC.Float.$wfromRat'' (-1021) 53 Main.logistic4 Main.logistic2
of ww2_a1Mt { __DEFAULT ->
case GHC.Float.$wfromRat'' (-1021) 53 Main.logistic2 Main.logistic2
of ww3_X1Nq { __DEFAULT ->
Main.$wlg
(GHC.Prim.*##
(GHC.Prim.*## ww2_a1Mt ww_s1NS) (GHC.Prim.-## ww3_X1Nq ww_s1NS))
(GHC.Prim.-# ds_Xvb 1)
}
};
0 -> ww_s1NS
}
end Rec }
and gave you two calls to the fromRational
worker in each iteration.
Now, fromRational
is a fairly complicated function. It is still rather slow, despite having gotten a much faster implementation in the 7.2 series, so these calls hurt big time.
With a type signature, there are no Rational
top-level constants produced, only Double
constants, and these are then used, which of course doesn't include a gratuitous slowdown.
As suggested by Dan Burton it is actually overhead of polymorphic function, because GHC infers type logistic :: Fractional a => a -> a
. If you specify type explicitly you generally enable both better checking and better optimizations. I believe it is good practice to specify type of function explicitly.
If you want to have function with polymorphic type but have full speed of monomorphic call in case of specific usage you can use SPECIALIZE
pragma, but I believe this is GHC specific.
{-# LANGUAGE BangPatterns #-}
module Main where
logistic :: Fractional a => a -> a
{-# SPECIALISE logistic :: Double -> Double #-}
logistic x = 4.0*x*(1.0-x)
lg :: Double -> Int -> Double
lg !x 0 = x
lg !x !n = lg (logistic x) (n-1)
main = putStrLn $ show $ lg 0.7861 100000000
Also note that you can specify LANGUAGE
pragma at the beginning of file to enable bang patterns and don't need to enable them on command line.
Times on my machine were 21 seconds for original, 0.67 sec for explicit type, 0.7 sec for specialize (which is basically the same).
I believe overhead of specialized call is very small because it is just bunch of instructions which gets inlined anyway but polymorphic function results in call. Though it is weird that GHC cant inline despite polymorphism.
Add a type signature to logistic
and you will see the speedup. Allow me to use CPP to demonstrate the difference.
bash> cat tst.hs
module Main where
#if defined(SIG)
logistic :: Double -> Double
#endif
logistic x = 4.0*x*(1.0-x)
lg :: Double -> Int -> Double
lg !x 0 = x
lg !x !n = lg (logistic x) (n-1)
main = putStrLn $ show $ lg 0.7861 100000000
bash> ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.4.1
If compiled without SIG defined (the type signature is excluded):
bash> ghc -O3 -XBangPatterns -XCPP -o tsths tst.hs
[1 of 1] Compiling Main ( tst.hs, tst.o )
Linking tsths ...
bash> time ./tsths
0.34209286442469333
real 0m13.187s
user 0m13.177s
sys 0m0.008s
Now lets compile with SIG defined so the signature is included:
bash> rm tsths *.o *.hi
bash> ghc -O3 -XBangPatterns -XCPP -DSIG -o tsths tst.hs
[1 of 1] Compiling Main ( tst.hs, tst.o )
Linking tsths ...
bash> time ./tsths
0.34209286442469333
real 0m0.464s
user 0m0.440s
sys 0m0.020s
Not sure why GHC doesn't optimize it without the signature; the monomorphism restriction should restrict it to Double -> Double
anyways.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With