Consider a simple function from the recent question:
myButLast :: [a] -> a
myButLast [x, y] = x
myButLast (x : xs) = myButLast xs
myButLast _ = error "List too short"
We can ask GHC to give us the simplifier output with ghc -ddump-simpl
. (Possibly with some
additional flags like -dsuppress-module-prefixes
-dsuppress-uniques
.) As I understand, it
is the last stage of compilation where the result still has any semblance to the original high
level code. So here is what it says:
-- RHS size: {terms: 21, types: 22, coercions: 0, joins: 0/0}
myButLast :: forall a. [a] -> a
[GblId,
Arity=1,
Str=<S,1*U>,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [30] 100 0}]
myButLast
= \ (@ a) (ds :: [a]) ->
case ds of {
[] -> myButLast1 @ a;
: x ds1 ->
case ds1 of {
[] -> myButLast1 @ a;
: y ds2 ->
case ds2 of {
[] -> x;
: ipv ipv1 -> myButLast_$smyButLast1 @ a y ipv ipv1
}
}
}
What is happening here? Let us see.
To the type signature, now with an explicit quantifier, some sort of annotations are attached.
I may guess that they say "global identifier, unary, top level", which is all true for this
function. The other annotations, like WorkFree=True
, Str=<S,1*U>
, are to me cryptic.
The "value" definition is now a lambda that accepts, in addition to a list, a type variable
argument, and proceeds to study the list by case analysis. [] -> myButLast1 @ a
is a glorified
error call, so let us ignore it for now. The interesting part is the call to
myButLast_$smyButLast1
(What kind of name is that? I thought $
sign could not be a part of an
identifier.), which turns out to be a tail recursive function that actually traverses the list.
And here it is, a single member of what we recognize as a mutually recursive block:
Rec {
-- RHS size: {terms: 13, types: 12, coercions: 0, joins: 0/0}
myButLast_$smyButLast1 [Occ=LoopBreaker]
:: forall a. a -> a -> [a] -> a
[GblId,
Arity=3,
Caf=NoCafRefs,
Str=<L,1*U><L,1*U><S,1*U>,
Unf=OtherCon []]
myButLast_$smyButLast1
= \ (@ a) (sc :: a) (sc1 :: a) (sc2 :: [a]) ->
case sc2 of {
[] -> sc;
: ipv ipv1 -> myButLast_$smyButLast1 @ a sc1 ipv ipv1
}
end Rec }
It is quite lucid, but it does have some features new to us, like the recursive block
delimiter Rec ... end Rec
and a cryptic remark [Occ=LoopBreaker]
. The annotations are also
different: the Unf
array is empty, and a Caf
field appears instead. I can only infer that
the Unf
field being interesting is a quality of names defined by the programmer, while
myButLast_$smyButLast1
is created by the compiler.
So, I can understand about half of what the simplifier gives me, by line count, but of some parts I cannot even begin to guess the meaning.
Is the premise, that the simplifier output is usually the most useful intermediate representation, correct?
Is my reading correct so far?
Is there a manual to all these cryptic remarks? What do they mean?
There is no manual or standalone documentation on Core that I'm aware of that goes into the kind of detail you are looking for. There is of course this page on Core from the Wiki, but it just explains the Core language at a high level and does so in mostly in terms of the compiler data structures used to represent a Core abstract syntax tree, rather than the concrete "pretty print" syntax.
The only way to get the detail you want is to download a copy of the GHC source and start looking through the code in ghc/compiler/coreSyn/
.
GHC is big and complicated, but most of it is written in Haskell, and so much of the code is very high level and quite readable, and the source code is heavily commented with excellent explanations and notes sprinkled everywhere.
If you want to know what WorkFree=True
means, for example, you would:
Find the code in PprCore.hs
that generates the annotation, to determine that it's the uf_is_work_free
field of CoreUnfolding
.
Check the definition of CoreUnfolding
in CoreSyn.hs
and its related comments, where you can see that an Unfolding
is a representation of an identifier that can be substituted while inlining, and the uf_is_work_free
flag is a cached copy of exprIsWorkFree
that somehow indicates that inlining the unfolding doesn't "waste work".
Look through comments in CoreSyn.hs
, CoreUnfold.hs
, and CoreUtils.hs
to find an additional explanation of exprIsWorkFree
and what it means:
exprIsWorkFree
is used when deciding whether to inline something; we don't inline it if doing so might duplicate work, by peeling off a complete copy of the expression.
An example is given:
let x = a #+ b in x +# x -- I think `#+` is a typo and should be `+#`
where it's pointed out that x
is not "work free", since if it was inlined on the RHS, that would cause the a +# b
operation to be evaluated twice
In your case, the version of myButLast
in your Core output is work-free because it doesn't evaluate any expressions independently of the arguments that could be reused every time myButLast
is applied.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With