Choose an AST representation in Haskell

Question

I am currently working on a toy functional language compiler, with the purpose of learning type-related things and slightly advanced techniques in Haskell.

I believe I need to attach info to each node on my tree, like position in the original source code for better error reporting, type inferred, type constraints generated, etc. So initially I chose this method:

data Expr a = ELam [Pat a]  (Expr a) a
            | ELet [Decl a] (Expr a) a
            | EIf  (Expr a) (Expr a) (Expr a) a
            | ECase (Expr a) [Alt a] a
            | EApp (Expr a) [Expr a] a
            | EVar Variable a
            | ECon Variable a
            | ELit Literal  a
            | ELOpSec (Op a) (Expr a) a
            | EROpSec (Op a) (Expr a) a
            | EInfix  (Expr a) (Op a) (Expr a) a
            | ENeg    (Expr a) a
-- Intermediate representation, removed after resolving infix expressions.
            | EParen  (Expr a) a
            deriving (Eq, Show, Functor, Generic1)

Functor is derived for effectively transforming node attached, while Generic1 allows a generic function to get that information out.

However, after I have finished the parser, I found the structure is a bit unfriendly for future-processing. Essentially I want a two layer structure, as the one mentioned here. Something I am trying:

data Expr w = ELam  [w (Pat w)]  (w (Expr w))
            | ELet  [w (Decl w)] (w (Expr w))
            | EIf   (w (Expr w)) (w (Expr w)) (w (Expr w))
            | ECase (w (Expr w)) [w (Alt w)]
            | EApp  (w (Expr w)) [w (Expr w)]
            | EVar  (w Var)
            | ECon  (w Var)
            | ELit  (w Lit)
            | ELOpSec (w Var) (w (Expr w))
            | EROpSec (w Var) (w (Expr w))
            | EInfix  (w (Expr w)) (w Var) (w (Expr w))
            | ENeg    (w (Expr w))

The initial motivation is for more convenient/beautiful pattern matching, as the unnecessary information can be separated out. Besides, I have just read a gentle introduction to recursion scheme so I hope I could take advantages of existing library.

However, the above code I was trying was rejected by GHC. More precisely, I can not use it to derive Eq instance automatically. I tried a complicated standalone derivation:

deriving instance (Eq (a b))   => Eq   (Expr a)

But it is rejected after enabling several fancy extensions. I noticed the recursion-scheme library has the following code:

newtype Fix f = Fix (f (Fix f))

deriving instance Eq (f (Fix f)) => Eq (Fix f)

But I couldn't understand the instance—how are we gonna satisfy Eq (f (Fix f))?

I noticed that in the ezyang's blog addressing AST typing approaches, he used:

data ExpF a = Num Int
            | Bool Bool
            | Var Var
            | If a a a
            | Lambda Var a
            | App a a
newtype Exp = Exp (ExpF Exp)
newtype TExp = TExp (ExpF TExp, Type)

But I have different types of node inside Expr, e.g. patterns and declarations.

My question are:

Is there a way to use a two-layer type structure to express my AST? How can I derive Eq and Show instances?
From a long term of view, is it better to just stick with my original old plain AST types?

Thanks!

Cirdec · Accepted Answer

An Eq instance for Expr w is fortunately fairly straightforward to write. When writing an Eq instance we'd normally ask for an Eq instance for all of the types used in the data type. For example, if we were going to write an Eq instance for

data E a b = A a
           | B b

We'd ask for a way to compare as and bs

{-# LANGUAGE StandaloneDeriving #-}

deriving instance (Eq a, Eq b) => Eq (E a b)

If a type is higher kinded ...

data E w = EA (w A)
         | EB (w B)

... we still ask for a way to compare each of the types that occurred anywhere in E

{-# LANGUAGE FlexibleContexts #-}

deriving instance (Eq (w A), Eq (w B)) => Eq (E a b)

The same thing can be done recursively, begging for a way to compare the inner types when deriving how to compare the data type. Recursive constraints require UndecidableInstances, in addition to the extensions used above.

{-# LANGUAGE UndecidableInstances #-}

data E w = EA (w A)
         | EB (w B)
         | EE (w (E w))

deriving instance (Eq (w A), Eq (w B), Eq (w (E w))) => Eq (E a b)

This is also what the recursion-schemes code for Fix is doing, but the only type used in Fix is f (Fix f), so the constraint looks kind of magical.

newtype Fix f = Fix (f (Fix f))

deriving instance Eq (f (Fix f)) => Eq (Fix f)

An Eq instance for Expr w is written the same way. The constraint asks for a way to compare each of the types that occurs in Expr w.

{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances #-}

deriving instance (
    Eq (w (Pat w)),
    Eq (w (Decl w)),
    Eq (w (Alt w)),
    Eq (w Var),
    Eq (w Lit),
    Eq (w (Expr w))) => Eq (Expr w)

Models

The data type Expr w has the somewhat unusual kind (* -> *) -> *, which I call a Model. It is the kind of things that, given a Functor or Monad, which both have kind * -> *, produces a data type (which has kind *). An Expr Identity is an exact model of an expression. An Expr IO could model an expression whose value is constructed by interaction with the outside world. An Expr ref could model an Expr distributed over multiple machines, where the ref type describes where to go to get the other components, like an identifier to a database record. An Expr Proxy is just the constructors of an Expr, without any of the data. You are probably looking for something like the Expr ((,) a) which models an expression with an annotation of type a at each component.

You're actually probably looking for (,) a (Expr ((,) a)), which has an annotation for the entire structure as well as for each of its components. This looks a bit too much like garbage. To clean it up, we'll add a type synonym for data models that are in some functor.

type In f d = f (d f)

One possible type for an expression read from source is the expression annotated with the line number and column where each component is from.

type SourcePos = (Int, Int)
type SourceExpr = In ((,) SourcePos) Expr

I'm going to make some examples of SourceExpr by hand to check that the derived Eq instance works.

example1 :: SourceExpr  
-- if Var then Con else Lit
example1 = ((1, 1) , EIf ((1, 4), EVar ((1, 4) , Var)) ((1, 7) , ECon ((1, 12) , Var)) ((1, 16) , ELit ((1, 12) , Lit)))

example2 :: SourceExpr
-- if Var then Lit else Con
example2 = ((1, 1) , EIf ((1, 4) , EVar ((1, 4) , Var)) ((1, 7) , ELit ((1, 12) , Lit)) ((1, 16) , ECon ((1, 12) , Var)))

example3 :: SourceExpr
-- if Var
-- then Con
-- else Lit
example3 = ((1, 1) , EIf ((1, 4) , EVar ((1, 4) , Var)) ((2, 1) , ECon ((2, 6) , Var)) ((3, 1) , ELit ((3, 6) , Lit)))

For testing, I also made data types for Pat et. al., each with a single eponymous constructor. The dervied Eq instance works satisfactorily.

main = do
    print $ example1 == example1
    print $ example1 == example2
    print $ example1 == example3
    print $ example2 == example2
    print $ example2 == example3
    print $ example3 == example3

Onward

From a long term of view, is it better to just stick with my original old plain AST types?

There isn't much existing Haskell support for working with models or data types with the kind (* -> *) -> *. If you start working with them the first thing you'll want is a class for data models d that are functors that map some natural transformation (forall a. f a -> g a) -> d f -> d g. You'd use this for example to remove the annotations from a source expression. The next thing you'll want is a lens-style uniplate library for data models, and any Applicative-like tools needed to support it.

You would be straying into the less-well-known. There are a couple ways to come back, but I haven't completely thought through either of them.

If you don't like the UndecidableInstances requirement for derived instances for Expr you could remove the explicit recursion, and add an argument for the model used for recursive occurrences of Expr. This would be compatible with a PolyKinded version of MMorph.

data Expr w e = ELam  [w (Pat w)]  (w (e w))
              | ELet  [w (Decl w)] (w (e w))
              | EIf   (w (e w)) (w (e w)) (w (e w))
              | ECase (w (e w)) [w (Alt w)]
              | EApp  (w (e w)) [w (e w)]
              | EVar  (w Var)
              | ECon  (w Var)
              | ELit  (w Lit)
              | ELOpSec (w Var) (w (e w))
              | EROpSec (w Var) (w (e w))
              | EInfix  (w (e w)) (w Var) (w (e w))
              | ENeg    (w (e w))

The type e only ever appears in the type e w, which has kind *. If we replace e w with a, it suggests another slightly more generic type that is more in line with the types that have good existing library support. This Expr has kind (* -> *) -> (* -> *) which is much more common; it is the kind of monad transformers.

data Expr w a = ELam  [w (Pat w)]  (w a)
              | ELet  [w (Decl w)] (w a)
              | EIf   (w a) (w a) (w a)
              | ECase (w a) [w (Alt w)]
              | EApp  (w a) [w a]
              | EVar  (w Var)
              | ECon  (w Var)
              | ELit  (w Lit)
              | ELOpSec (w Var) (w a)
              | EROpSec (w Var) (w a)
              | EInfix  (w a) (w Var) (w a)
              | ENeg    (w a)

Either way you'd probably still be wanting a lens-style uniplate library compatible with your Exprs.

Choose an AST representation in Haskell

Tags:

haskell

abstract-syntax-tree

type-systems

Minsheng Liu

1 Answers

Models

Onward

Cirdec

Recent Activity

Donate For Us

Choose an AST representation in Haskell

Tags:

haskell

abstract-syntax-tree

type-systems

Minsheng Liu

1 Answers

Models

Onward

Cirdec

Related questions

Recent Activity

Donate For Us