The typical examples for the benefits of a GADT are representing the syntax for a DSL; say here on the wiki or the PLDI 2005 paper.
I can see that if you have a AST that's type-correct by construction, writing an eval
function is easy.
How to build the GADT handling into a REPL? Or more specifically into a Read-Parse-Typecheck-Eval-Print-Loop? I'm seeing that you just push the complexity from the eval
step into earlier steps.
Does GHCi use a GADT internally to represent expressions it's evaluating? (The expressions are a lot chunkier than a typical DSL.)
For one thing, you can't derive Show
for a GADT, so for the Print step you either hand-roll Show
instances or something like this:
{-# LANGUAGE GADTs, StandaloneDeriving #-}
data Term a where
Lit :: Int -> Term Int
Inc :: Term Int -> Term Int
IsZ :: Term Int -> Term Bool
If :: Term Bool -> Term a -> Term a -> Term a
Pair :: (Show a, Show b) => Term a -> Term b -> Term (a,b)
Fst :: (Show b) => Term (a,b) -> Term a
Snd :: (Show a) => Term (a,b) -> Term b
deriving instance (Show a) => Show (Term a)
(It seems to me those Show
constraints tangled in the constructors are already failing to separate concerns.)
I'm more thinking about the user-experience for somebody entering DSL expressions, rather than the programmer's convenience of the eval
function. Either:
Read
instance is real hard work. So perhapsAt that last bullet, I seem to be back with 'smart constructors', that GADTs are supposed to improve on(?) What's more I've doubled the work somewhere.
I don't have a 'better way' to approach it. I'm wondering how to approach DSL applications in practice. (For context: I'm thinking about a database query environment, where type inference has to look at the types of the fields in the database to validate what operations on them.)
Addit: after working through the answer from @Alec
I see the code for pretty printing in glambda
involves several layers of classes and instances. Something feels wrong here as opposed to what are the claimed advantages of a GADT for an AST. The idea of a (well-typed) AST is you can equally easily: eval it; or pretty-print it; or optimise it; or code-generate from it; etc.
glambda
's seems to be aimed at eval'ing (which is fair enough given the purpose of the exercise). I'm wondering ...
Why the need to express the whole syntax for the (E)DSL in one datatype? (The wikibook example starts its straw man doing that data Expr = ...
; and rapidly runs into type trouble. Well of course it does; that's never going to work; almost anything would work better than that; I feel cheated.)
If we end up writing classes and instances anyway, why not make each syntax production a separate datatype: data Lit = Lit Int
... data If b a1 a2 = If b a1 a2
... Then a class IsTerm a c | a -> c where ...
(i.e. a FunDep
or maybe a Type Family whose instances tell us the Term's result-type.)
Now the EDSL uses the same constructors (the user doesn't care they're from different datatypes); and they apply 'sloppy' type-checking. Pretty printing/error reporting also doesn't need tight typechecking. Eval does, and insists on the IsTerm
instances all lining up.
I didn't suggest this approach before, because it seemed to involve too much crufty code. But actually it's no worse than glambda
-- that is, when you consider the whole functionality, not just the eval step.
It seems to me a big advantage to express the syntax only once. Furthermore it seems more extensible: add a new datatype per syntax production, rather than breaking open an existing datatype. Oh, and because they're H98 datatypes (no existentials), deriving
works fine.
Note that GHCi does not use GADTs to represent expressions. Even GHC's internal core expression type Expr
is not a GADT.
For the purpose of having a larger more fleshed out example of your Term
type, consider glambda
. Its Exp
type even tracks variables at the type level.
There is a second UExp
data type which, as you observed yourself, is what gets actually parsed from the REPL. This type then gets typechecked into Exp
and passed on to a continuation with:
check :: (MonadError Doc m, MonadReader Globals m)
=> UExp -> (forall t. STy t -> Exp '[] t -> m r)
-> m r
Pretty-printing of UExp
and Exp
is hand-written, but at least uses the same code (it does this via a PrettyExp
class).
The evaluation code itself is beautiful, but I doubt I need to sell you on that. :)
As I understand it, GADTs are splendid for EDSLs (embedded DSLs), since these are just portions of code in a large Haskell program. Yes, type errors can be complicated (and will come from GHC directly), but that's the price you pay for being able to maintain type-level invariants in your code. Consider, for instance, hoopl
's representation of basic blocks in a CFG:
data Block n e x where
BlockCO :: n C O -> Block n O O -> Block n C O
BlockCC :: n C O -> Block n O O -> n O C -> Block n C C
BlockOC :: Block n O O -> n O C -> Block n O C
BNil :: Block n O O
BMiddle :: n O O -> Block n O O
BCat :: Block n O O -> Block n O O -> Block n O O
BSnoc :: Block n O O -> n O O -> Block n O O
BCons :: n O O -> Block n O O -> Block n O O
Sure, you open yourself up to nasty type errors, but you also have the ability to track fallthrough information at the type-level. This makes it much easier to think about dataflow problems.
The point I'm trying to make is: if your GADT is being constructed from a String
(or a custom REPL), you'll have a rough time performing the translation. That's unavoidable because what you are doing is essentially re-implementing a simple type-checker. Your best bet is to confront this head on (as glambda
does) and distinguish the parsing from the type-checking.
However, if you can afford to stay within the bounds of Haskell code, you can just hand parsing and typechecking to GHC. IMHO, EDSLs are way cooler and more practical that non-embedded DSLs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With