Understanding mono- vs. polymorphic Core expressions

Question

I'd like to understand how simple arithmetic expressions are compiled into GHC 9.0.1 Core

For this purpose, I've isolated the same declaration,

f = \x y -> sqrt x + y

and compiled it at two different types:

f :: Double -> Double -> Double

f :: Floating a => a -> a -> a

resulting in these very different beasts :

\ (x [Dmd=<S,1*U(U)>] :: Double) (y [Dmd=<S,1*U(U)>] :: Double) ->
  case x of { D# x ->
  case y of { D# y -> D# (+## (sqrtDouble# x) y) }
  }

in the monomorphic case and

\ (@a)
  ($dFloating_a22t [Dmd=<S(S(S(C(C(S))LLLLLL)LLL)LLLLLLLLLLLLLLLLLLLLLL),U(1*U(1*U(1*C1(C1(U)),A,A,A,A,A,A),A,A,A),A,A,A,1*C1(U),A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A)>]
     :: Floating a)
  (eta_B0 :: a)
  (eta_B1 :: a) ->
  + @a
    ($p1Fractional @a ($p1Floating @a $dFloating_a22t))
    (sqrt @a $dFloating_a22t eta_B0)
    eta_B1

in the polymorphic one.

Besides the demand/strictness annotations (the Dmd=... stuff), which deserve a separate question, I'm curious to understand the different shape of the two resulting expressions. In the monomorphic case, we do a nested case match, but in the other one we only see the operators annotated with type variables (@a) and constraints ($p1Fractional etc.)

I guess I should add that the above output is obtained via a custom plugin pass that runs at the very end of the preexisting CoreToDo list, so after desugaring etc. (that's about all I know about this subject btw)

Is there a GHC stage in which these two compiled versions are structurally identical?

Li-yao Xia · Accepted Answer

Internally a Float is a data constructor with one field containing a primitive Float# in the underlying machine's format (so a 32-bit word).

data Float = MkFloat Float#

This is a boxed representation so that a Float may also be a thunk. It is the role of a case expression to evaluate any thunk and then access the underlying machine Float#.

Hence, your function Float -> Float -> Float, after simplification, compiles to a function that matches on the arguments and performs primitive operations on the underlying values.

Type classes are compiled to dictionaries.

class Floating a where
  sqrt :: a -> a
  ...

-- becomes a record --

data Floating a = MkFloating {
  sqrt :: a -> a,
  ... }

Polymorphic functions become dictionary-passing functions

Floating a => a -> a -> a
-- becomes --
Floating a -> a -> a -> a

Method invocations become field accesses:

sqrt x
-- becomes --
sqrt floatingDict x

-- where floatingDict :: Floating a is constructed from dictionary parameters in scope and from instances (which are compiled to dictionaries or functions on dictionaries)

(NB: this is somewhat specific to GHC, other compilers, though rare, may do things differently.)

Is there a GHC stage in which these two compiled versions are structurally identical?

No, why would they be?

K. A. Buhr · Answer

The two functions:

f :: Double -> Double -> Double
f = \x y -> sqrt x + y

g :: Floating a => a -> a -> a
g = \x y -> sqrt x + y

diverge very early in the compilation process. Using GHC 8.6.5 with -O0, you can can see that the term-level representations are identical after type checking (-ddump-tc):

-- ghc -ddump-tc -dsuppress-all -dsuppress-uniques -fforce-recomp -O0
g = \ x y -> sqrt x + y
f = \ x y -> sqrt x + y

but have already diverged after desugaring, before the "simple optimizer" pass (-ddump-ds-preopt):

-- ghc -ddump-ds-preopt -dsuppress-all -dsuppress-uniques -fforce-recomp -O0
g = \ @ a $dFloating ->
      let {
        $dFloating = $dFloating } in
      let {
        $dFractional = $p1Floating $dFloating } in
      let {
        $dNum = $p1Fractional $dFractional } in
      let {
        $dNum = $dNum } in
      \ x y -> + $dNum (sqrt $dFloating x) y

$dNum = $fNumDouble
$dFloating = $fFloatingDouble
f = \ x y -> + $dNum (sqrt $dFloating x) y

You can see here that both calls are dictionary-based and so very similar in internal structure, but already by the time we've desugared, the g function takes type and dictionary arguments that are absent from the f function definition, which uses global variables for the needed dictionaries.

I guess I'd call that a substantial difference in the AST. The rest is just optimization -- specifically, optimizing the lookups in constant, known dictionaries for f and inlining the results.

It might also help to consider the simpler:

f :: Int -> Int
f x = x

g :: a -> a
g x = x

These, too, diverge after desugaring, and the difference persists in core through to the end of optimization:

-- ghc -ddump-simpl -dsuppress-all -dsuppress-uniques -fforce-recomp -O2
f = \ x -> x
g = \ @ a x -> x

I'd argue that these are two clearly different core functions, since g takes an extra type argument. It's only after types are erased, when the Core is rewritten to the STG that the two functions "come back together" at the term level:

-- ghc -ddump-stg -dsuppress-all -dsuppress-uniques -fforce-recomp -O2
f = \r [x] x;
g = \r [x] x;

Understanding mono- vs. polymorphic Core expressions

Tags:

haskell

ghc

ocramz

2 Answers

Li-yao Xia

K. A. Buhr

Recent Activity

Donate For Us

Understanding mono- vs. polymorphic Core expressions

Tags:

haskell

ghc

ocramz

2 Answers

Li-yao Xia

K. A. Buhr

Related questions

Recent Activity

Donate For Us