Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Types for parser combinators

If I have a parser a : Parser A and a parser b : Parser B then I can combine it into a parser a | b : Parser (Either A B). This works but gets a little tricky when you start adding more alternatives and getting types like Either A (Either B C). I can imagine flattening the previous type into something like Alternative A B C. Is there a standard transformation I can perform or am I stuck with generating a whole bunch of boilerplate for types like Alternative A B C ....

like image 685
David K. Avatar asked Aug 20 '14 21:08

David K.


2 Answers

So the interesting thing about Either is that you can use it as a type-level cons operator.

A `Either` (B `Either` (C `Either` (D `Either` Void))) --> [A,B,C,D]

So all we need do is make that explicit. You'll need ghc-7.8 to support closed data families:

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE DataKinds #-}
-- ...

type family OneOf (as :: [*]) :: * where
  OneOf '[a] = a
  OneOf (a ': as) = Either a (OneOf as)

Now you can write your types much more succinctly:

aorborc :: Parser (OneOf '[A, B, C])
aorborc = a | (b | c)

It's still Either under the hood, so you can still easily interoperate with all existing code that uses Either, which is nice.

like image 76
rampion Avatar answered Sep 27 '22 15:09

rampion


Either is just one possible sum type in Haskell, and because of the ready made class instances and helper functions is useful for many cases, but becomes considerably clunkier when you nest it.

The best approach for a parser is to create your own data type that mirrors the structure you're parsing and parse directly into that. Let's make a partial toy example about a toy language.

data Statement = TypeDec String Type
                 DataDec String [Constructor]
                 FunctionDec String LambdaExpression

statement :: Parser Statement
statement = TypeDec <$> string "type " *> identifier <*> string " = " *> type
            <|> DataDec <$> string "data " *> identifier <*> string " = " *> many constructor
            <|> FunctionDec <$> identifier <*> string " = " *> lambdaExpression

In this way, both your data structure and your code mirror the productions in the grammar you're parsing. The great benefit to that is that your data is type safe, clear and ready to use as soon as it's parsed.

(I can never remember the fixities of *> and <*, so I've probably done it the way you need brackets or something, but hopefully you get the idea.)

like image 44
AndrewC Avatar answered Sep 27 '22 17:09

AndrewC