If I want to call more than one C function, each one depending on the result of the previous one, is it better to create a wrapper C function that handles the three calls? Will it cost the same as using Haskell FFI without converting types?
Suppose I have the following Haskell code:
foo :: CInt -> IO CInt
foo x = do
a <- cfA x
b <- cfB a
c <- cfC c
return c
Each function cf*
is a C call.
Is it better, in terms of performance, to create a single C function like cfABC
and make only one foreign call in Haskell?
int cfABC(int x) {
int a, b, c;
a = cfA(x);
b = cfB(a);
c = cfC(b);
return c;
}
Haskell code:
foo :: CInt -> IO CInt
foo x = do
c <- cfABC x
return c
How to measure the performace cost of a C call from Haskell? Not the cost of the C function itself, but the cost of the "context-switching" from Haskell to C and back.
The answer depends mostly on whether the foreign call is a safe
or an unsafe
call.
An unsafe
C call is basically just a function call, so if there's no (nontrivial) type conversion, there are three function calls if you make three foreign calls, and between one and four when you write a wrapper in C, depending on how many of the component functions can be inlined when compiling the C, since a foreign call into C cannot be inlined by GHC. Such a function call is generally very cheap (it's just a copy of the arguments and a jump to the code), so the difference is small either way, the wrapper should be slightly slower when no C function can be inlined into the wrapper, and slightly faster when all can be inlined [and that was indeed the case in my benchmarking, +1.5ns resp. -3.5ns where the three foreign calls took about 12.7ns for everything just returning the argument]. If the functions do something nontrivial, the difference is negligible (and if they're not doing anything nontrivial, you'd probably better write them in Haskell to let GHC inline the code).
A safe
C call involves saving some nontrivial amount of state, locking, possibly spawning a new OS thread, so that takes much longer. Then the small overhead of perhaps calling one function more in C is negligible compared to the cost of the foreign calls [unless passing the arguments requires an unusual amount of copying, many huge struct
s or so]. In my do-nothing benchmark
{-# LANGUAGE ForeignFunctionInterface #-}
module Main (main) where
import Criterion.Main
import Foreign.C.Types
import Control.Monad
foreign import ccall safe "funcs.h cfA" c_cfA :: CInt -> IO CInt
foreign import ccall safe "funcs.h cfB" c_cfB :: CInt -> IO CInt
foreign import ccall safe "funcs.h cfC" c_cfC :: CInt -> IO CInt
foreign import ccall safe "funcs.h cfABC" c_cfABC :: CInt -> IO CInt
wrap :: (CInt -> IO CInt) -> Int -> IO Int
wrap foo arg = fmap fromIntegral $ foo (fromIntegral arg)
cfabc = wrap c_cfABC
foo :: Int -> IO Int
foo = wrap (c_cfA >=> c_cfB >=> c_cfC)
main :: IO ()
main = defaultMain
[ bench "three calls" $ foo 16
, bench "single call" $ cfabc 16
]
where all the C functions just return the argument, the mean for the single wrapped call is a bit above 100ns [105-112], and for the three separate calls around 300ns [290-315].
So a safe
c call takes roughly 100ns and usually, it is then faster to wrap them up into a single call. But still, if the called functions do something sufficiently nontrivial, the difference won't matter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With