Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does GHC typecheck before desugaring?

Tags:

Is there a good reason to run the typechecker first? It would seem that the typechecker would be vastly simpler if it ran on a smaller syntax, especially because with the current system every syntax extension needs to touch the typechecker. This question applies especially to arrow syntax, the typechecking of which as described in comments here is known to be bogus.

I imagine one reason for this would be not emitting errors that mention generated code, but this situation is already covered in cases where a deriving clause fails to typecheck; GHC knows that code was generated.

like image 660
Dan Avatar asked Jul 08 '14 18:07

Dan


People also ask

How GHC works?

GHC compiles Haskell code either directly to native code or using LLVM as a back-end. GHC can also generate C code as an intermediate target for porting to new platforms. The interactive environment compiles Haskell to bytecode, and supports execution of mixed bytecode/compiled programs.

What is GHC written in?

GHC itself is written in Haskell, but the runtime system for Haskell, essential to run programs, is written in C and C--.


1 Answers

There is a section on this question in the GHC article found in volume 2 of the book "The Architecture of Open Source Application":

Type Checking the Source Language

One interesting design decision is whether type checking should be done before or after desugaring. The trade-offs are these:

  • Type checking before desugaring means that the type checker must deal directly with Haskell's very large syntax, so the type checker has many cases to consider. If we desugared into (an untyped variant of) Core first, one might hope that the type checker would become much smaller.

  • On the other hand, type checking after desugaring would impose a significant new obligation: that desugaring does not affect which programs are type-correct. After all, desugaring implies a deliberate loss of information. It is probably the case that in 95% of the cases there is no problem, but any problem here would force some compromise in the design of Core to preserve some extra information.

  • Most seriously of all, type checking a desugared program would make it much harder to report errors that relate to the original program text, and not to its (sometimes elaborate) desugared version.

Most compilers type check after desugaring, but for GHC we made the opposite choice: we type check the full original Haskell syntax, and then desugar the result. It sounds as if adding a new syntactic construct might be complicated, but (following the French school) we have structured the type inference engine in a way that makes it easy. Type inference is split into two parts:

  • Constraint generation: walk over the source syntax tree, generating a collection of type constraints. This step deals with the full syntax of Haskell, but it is very straightforward code, and it is easy to add new cases.

  • Constraint solving: solve the gathered constraints. This is where the subtlety of the type inference engine lies, but it is independent of the source language syntax, and would be the same for a much smaller or much larger language.

On the whole, the type-check-before-desugar design choice has turned out to be a big win. Yes, it adds lines of code to the type checker, but they are simple lines. It avoids giving two conflicting roles to the same data type, and makes the type inference engine less complex, and easier to modify. Moreover, GHC's type error messages are pretty good.

like image 177
bennofs Avatar answered Oct 07 '22 16:10

bennofs