Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read simplifier output?

Consider a simple function from the recent question:

myButLast :: [a] -> a
myButLast [x, y] = x
myButLast (x : xs) = myButLast xs
myButLast _ = error "List too short"

We can ask GHC to give us the simplifier output with ghc -ddump-simpl. (Possibly with some additional flags like -dsuppress-module-prefixes -dsuppress-uniques.) As I understand, it is the last stage of compilation where the result still has any semblance to the original high level code. So here is what it says:

-- RHS size: {terms: 21, types: 22, coercions: 0, joins: 0/0}
myButLast :: forall a. [a] -> a
[GblId,
 Arity=1,
 Str=<S,1*U>,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True, Guidance=IF_ARGS [30] 100 0}]
myButLast
  = \ (@ a) (ds :: [a]) ->
      case ds of {
        [] -> myButLast1 @ a;
        : x ds1 ->
          case ds1 of {
            [] -> myButLast1 @ a;
            : y ds2 ->
              case ds2 of {
                [] -> x;
                : ipv ipv1 -> myButLast_$smyButLast1 @ a y ipv ipv1
              }
          }
      }

What is happening here? Let us see.

  1. To the type signature, now with an explicit quantifier, some sort of annotations are attached. I may guess that they say "global identifier, unary, top level", which is all true for this function. The other annotations, like WorkFree=True, Str=<S,1*U>, are to me cryptic.

  2. The "value" definition is now a lambda that accepts, in addition to a list, a type variable argument, and proceeds to study the list by case analysis. [] -> myButLast1 @ a is a glorified error call, so let us ignore it for now. The interesting part is the call to myButLast_$smyButLast1 (What kind of name is that? I thought $ sign could not be a part of an identifier.), which turns out to be a tail recursive function that actually traverses the list.

  3. And here it is, a single member of what we recognize as a mutually recursive block:

    Rec {
    -- RHS size: {terms: 13, types: 12, coercions: 0, joins: 0/0}
    myButLast_$smyButLast1 [Occ=LoopBreaker]
      :: forall a. a -> a -> [a] -> a
    [GblId,
     Arity=3,
     Caf=NoCafRefs,
     Str=<L,1*U><L,1*U><S,1*U>,
     Unf=OtherCon []]
    myButLast_$smyButLast1
      = \ (@ a) (sc :: a) (sc1 :: a) (sc2 :: [a]) ->
          case sc2 of {
            [] -> sc;
            : ipv ipv1 -> myButLast_$smyButLast1 @ a sc1 ipv ipv1
          }
    end Rec }
    

    It is quite lucid, but it does have some features new to us, like the recursive block delimiter Rec ... end Rec and a cryptic remark [Occ=LoopBreaker]. The annotations are also different: the Unf array is empty, and a Caf field appears instead. I can only infer that the Unf field being interesting is a quality of names defined by the programmer, while myButLast_$smyButLast1 is created by the compiler.

So, I can understand about half of what the simplifier gives me, by line count, but of some parts I cannot even begin to guess the meaning.

 

  • Is the premise, that the simplifier output is usually the most useful intermediate representation, correct?

  • Is my reading correct so far?

  • Is there a manual to all these cryptic remarks? What do they mean?

like image 697
Ignat Insarov Avatar asked Oct 26 '19 11:10

Ignat Insarov


1 Answers

There is no manual or standalone documentation on Core that I'm aware of that goes into the kind of detail you are looking for. There is of course this page on Core from the Wiki, but it just explains the Core language at a high level and does so in mostly in terms of the compiler data structures used to represent a Core abstract syntax tree, rather than the concrete "pretty print" syntax.

The only way to get the detail you want is to download a copy of the GHC source and start looking through the code in ghc/compiler/coreSyn/.

GHC is big and complicated, but most of it is written in Haskell, and so much of the code is very high level and quite readable, and the source code is heavily commented with excellent explanations and notes sprinkled everywhere.

If you want to know what WorkFree=True means, for example, you would:

  1. Find the code in PprCore.hs that generates the annotation, to determine that it's the uf_is_work_free field of CoreUnfolding.

  2. Check the definition of CoreUnfolding in CoreSyn.hs and its related comments, where you can see that an Unfolding is a representation of an identifier that can be substituted while inlining, and the uf_is_work_free flag is a cached copy of exprIsWorkFree that somehow indicates that inlining the unfolding doesn't "waste work".

  3. Look through comments in CoreSyn.hs, CoreUnfold.hs, and CoreUtils.hs to find an additional explanation of exprIsWorkFree and what it means:

    exprIsWorkFree is used when deciding whether to inline something; we don't inline it if doing so might duplicate work, by peeling off a complete copy of the expression.

    An example is given:

    let x = a #+ b in x +# x  -- I think `#+` is a typo and should be `+#`
    

    where it's pointed out that x is not "work free", since if it was inlined on the RHS, that would cause the a +# b operation to be evaluated twice

In your case, the version of myButLast in your Core output is work-free because it doesn't evaluate any expressions independently of the arguments that could be reused every time myButLast is applied.

like image 142
K. A. Buhr Avatar answered Oct 12 '22 00:10

K. A. Buhr