I'd like to understand how simple arithmetic expressions are compiled into GHC 9.0.1 Core
For this purpose, I've isolated the same declaration,
f = \x y -> sqrt x + y
and compiled it at two different types:
f :: Double -> Double -> Double
f :: Floating a => a -> a -> a
resulting in these very different beasts :
\ (x [Dmd=<S,1*U(U)>] :: Double) (y [Dmd=<S,1*U(U)>] :: Double) ->
case x of { D# x ->
case y of { D# y -> D# (+## (sqrtDouble# x) y) }
}
in the monomorphic case and
\ (@a)
($dFloating_a22t [Dmd=<S(S(S(C(C(S))LLLLLL)LLL)LLLLLLLLLLLLLLLLLLLLLL),U(1*U(1*U(1*C1(C1(U)),A,A,A,A,A,A),A,A,A),A,A,A,1*C1(U),A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A)>]
:: Floating a)
(eta_B0 :: a)
(eta_B1 :: a) ->
+ @a
($p1Fractional @a ($p1Floating @a $dFloating_a22t))
(sqrt @a $dFloating_a22t eta_B0)
eta_B1
in the polymorphic one.
Besides the demand/strictness annotations (the Dmd=...
stuff), which deserve a separate question, I'm curious to understand the different shape of the two resulting expressions. In the monomorphic case, we do a nested case match, but in the other one we only see the operators annotated with type variables (@a
) and constraints ($p1Fractional
etc.)
I guess I should add that the above output is obtained via a custom plugin pass that runs at the very end of the preexisting CoreToDo list, so after desugaring etc. (that's about all I know about this subject btw)
Is there a GHC stage in which these two compiled versions are structurally identical?
Internally a Float
is a data constructor with one field containing a primitive Float#
in the underlying machine's format (so a 32-bit word).
data Float = MkFloat Float#
This is a boxed representation so that a Float
may also be a thunk. It is the role of a case
expression to evaluate any thunk and then access the underlying machine Float#
.
Hence, your function Float -> Float -> Float
, after simplification, compiles to a function that matches on the arguments and performs primitive operations on the underlying values.
Type classes are compiled to dictionaries.
class Floating a where
sqrt :: a -> a
...
-- becomes a record --
data Floating a = MkFloating {
sqrt :: a -> a,
... }
Polymorphic functions become dictionary-passing functions
Floating a => a -> a -> a
-- becomes --
Floating a -> a -> a -> a
Method invocations become field accesses:
sqrt x
-- becomes --
sqrt floatingDict x
-- where floatingDict :: Floating a is constructed from dictionary parameters in scope and from instances (which are compiled to dictionaries or functions on dictionaries)
(NB: this is somewhat specific to GHC, other compilers, though rare, may do things differently.)
Is there a GHC stage in which these two compiled versions are structurally identical?
No, why would they be?
The two functions:
f :: Double -> Double -> Double
f = \x y -> sqrt x + y
g :: Floating a => a -> a -> a
g = \x y -> sqrt x + y
diverge very early in the compilation process. Using GHC 8.6.5 with -O0
, you can can see that the term-level representations are identical after type checking (-ddump-tc
):
-- ghc -ddump-tc -dsuppress-all -dsuppress-uniques -fforce-recomp -O0
g = \ x y -> sqrt x + y
f = \ x y -> sqrt x + y
but have already diverged after desugaring, before the "simple optimizer" pass (-ddump-ds-preopt
):
-- ghc -ddump-ds-preopt -dsuppress-all -dsuppress-uniques -fforce-recomp -O0
g = \ @ a $dFloating ->
let {
$dFloating = $dFloating } in
let {
$dFractional = $p1Floating $dFloating } in
let {
$dNum = $p1Fractional $dFractional } in
let {
$dNum = $dNum } in
\ x y -> + $dNum (sqrt $dFloating x) y
$dNum = $fNumDouble
$dFloating = $fFloatingDouble
f = \ x y -> + $dNum (sqrt $dFloating x) y
You can see here that both calls are dictionary-based and so very similar in internal structure, but already by the time we've desugared, the g
function takes type and dictionary arguments that are absent from the f
function definition, which uses global variables for the needed dictionaries.
I guess I'd call that a substantial difference in the AST. The rest is just optimization -- specifically, optimizing the lookups in constant, known dictionaries for f
and inlining the results.
It might also help to consider the simpler:
f :: Int -> Int
f x = x
g :: a -> a
g x = x
These, too, diverge after desugaring, and the difference persists in core through to the end of optimization:
-- ghc -ddump-simpl -dsuppress-all -dsuppress-uniques -fforce-recomp -O2
f = \ x -> x
g = \ @ a x -> x
I'd argue that these are two clearly different core functions, since g
takes an extra type argument. It's only after types are erased, when the Core is rewritten to the STG that the two functions "come back together" at the term level:
-- ghc -ddump-stg -dsuppress-all -dsuppress-uniques -fforce-recomp -O2
f = \r [x] x;
g = \r [x] x;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With