In Haskell to define an instance of a type class you need to supply a dictionary of functions required by the type class. I.e. to define an instance of Bounded
, you need to supply a definition for minBound
and maxBound
.
For the purpose of this question, let's call this dictionary the vtbl
for the type class instance. Let me know if this is poor analogy.
My question centers around what kind of code generation can I expect from GHC when I call a type class function. In such cases I see three possibilities:
I'd like to understand when each of these occur - or if there are other possibilities.
Also, does it matter if the type class was defined in a separately compiled module as opposed to being part of the "main" compilation unit?
In a runnable program it seems that Haskell knows the types of all the functions and expressions in the program. Therefore, when I call a type class function the compiler should know what the vtbl is and exactly which implementation function to call. I would expect the compiler to at least generate a direct call to implementation function. Is this true?
(I say "runnable program" here to distinguish it from compiling a module which you don't run.)
As with all good questions, the answer is "it depends". The rule of thumb is that there's a runtime cost to any typeclass-polymorphic code. However, library authors have a lot of flexibility in eliminating this cost with GHC's rewrite rules, and in particular there is a {-# SPECIALIZE #-}
pragma that can automatically create monomorphic versions of polymorphic functions and use them whenever the polymorphic function can be inferred to be used at the monomorphic type. (The price for doing this is library and executable size, I think.)
You can answer your question for any particular code segment using ghc's -ddump-simpl
flag. For example, here's a short Haskell file:
vDouble :: Double
vDouble = 3
vInt = length [2..5]
main = print (vDouble + realToFrac vInt)
Without optimizations, you can see that GHC does the dictionary lookup at runtime:
Main.main :: GHC.Types.IO ()
[GblId]
Main.main =
System.IO.print
@ GHC.Types.Double
GHC.Float.$fShowDouble
(GHC.Num.+
@ GHC.Types.Double
GHC.Float.$fNumDouble
(GHC.Types.D# 3.0)
(GHC.Real.realToFrac
@ GHC.Types.Int
@ GHC.Types.Double
GHC.Real.$fRealInt
GHC.Float.$fFractionalDouble
(GHC.List.length
@ GHC.Integer.Type.Integer
(GHC.Enum.enumFromTo
@ GHC.Integer.Type.Integer
GHC.Enum.$fEnumInteger
(__integer 2)
(__integer 5)))))
...the relevant bit being realToFrac @Int @Double
. At -O2
, on the other hand, you can see it did the dictionary lookup statically and inlined the implementation, the result being a single call to int2Double#
:
Main.main2 =
case GHC.List.$wlen @ GHC.Integer.Type.Integer Main.main3 0
of ww_a1Oq { __DEFAULT ->
GHC.Float.$w$sshowSignedFloat
GHC.Float.$fShowDouble_$sshowFloat
GHC.Show.shows26
(GHC.Prim.+## 3.0 (GHC.Prim.int2Double# ww_a1Oq))
(GHC.Types.[] @ GHC.Types.Char)
}
It's also possible for a library author to choose to rewrite the polymorphic function to a call to a monomorphic one but not inline the implementation of the monomorphic one; this means that all of the possibilities you proposed (and more) are possible.
If the compiler can "tell", at compile-time, what actual type you're using, then the method lookup happens at compile-time. Otherwise it happens at run-time. If lookup happens at compile-time, the method code may be inlined, depending on how large the method is. (This goes for regular functions too: If the compiler knows which function you're calling, it will inline it if that function is "small enough".)
Consider, for example, (sum [1 .. 10]) :: Integer
. Here the compiler statically knows that the list is a list of Integer
takes, so it can inline the +
function for Integer
. On the other hand, if you do something like
foo :: Num x => [x] -> x
foo xs = sum xs - head x
then, when you call sum
, the compiler doesn't know what type you're using. (It depends on what type is given to foo
), so it can't do any compile-time lookup.
On the other hand, using the {-# SPECIALIZE #-}
pragma, you can do something like
{-# SPECIALIZE foo:: [Int] -> Int #-}
What this does is tell the compiler to compile a special version of foo
where the input is a list of Int
values. This obviously means that for that version, the compiler can do all the method lookups at compile-time (and almost certainly inline them all). Now there are two versions of foo
- one which works for any type and does run-time type lookups, and one that works only for Int
, but is [probably] much faster.
When you call the foo
function, the compiler has to decide which version to call. If the compiler can "tell", at compile-time, that you want the Int
version, it will do that. If it can't "tell" what type you're going to use, it'll use the slower any-type version.
Note that you can have multiple specialisations of a single function. For example, you could do
{-# SPECIALIZE foo :: [Int] -> Int #-}
{-# SPECIALIZE foo :: [Double] -> Double #-}
{-# SPECIALIZE foo :: [Complex Double] -> Complex Double #-}
Now, whenever the compiler can tell that you're using one of these types, it'll use the version hard-coded for that type. But if the compiler can't tell what type you're using, it'll never use the specialised versions, and always the polymorphic one. (That might mean that you need to specialise the function(s) that call foo
, for example.)
If you crawl around the compiler's Core output, you can probably figure out exactly what it did in any particular circumstance. You will probably go stark raving mad though...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With