To learn how to write and parse a context-free grammar I want to choose a tool. For Haskell, there are two big options: Happy, which generates a parser from a grammar description and *Parsec, which allows you to directly code a parser in Haskell.
What are the (dis)advantages of either approach?
Parser combinators are generally slower than a hand-written or code-generated parser. That's somewhat innate due to the overhead of “threading” (for lack of a better word) your control flow through many function calls.
A parser generator takes a grammar as input and automatically generates source code that can parse streams of characters using the grammar. The generated code is a parser, which takes a sequence of characters and tries to match the sequence against the grammar.
Java Compiler Compiler (JavaCC) is the most popular parser generator for use with Java applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.
A Parser combinator, as wikipedia describes it, is a higher-order function that accepts several parsers as input and returns a new parser as its output. They can be very powerful when you want to build modular parsers and leave them open for further extension.
External vs internal DSL
The parser specification format for Happy is an external DSL, whereas with Parsec you have the full power of Haskell available when defining your parsers. This means that you can for example write functions to generate parsers, use Template Haskell and so on.
Precedence rules
With Happy, you can use precedences to simplify your grammar, whereas with Parsec you have to nest the grammar rules correctly yourself. Changing the precedence of an operator is therefore much more tedious in Parsec.
Static checking
Happy will warn you about ambiguities in your grammar at compile time. (Though it's not great at telling you where they are.) With Parsec, you get no warning until your parser fails at run time.
This is the traditional decision: do I use lex/yacc (happy) or do I write my own (mostly recursive descent) parser, only that the parsec library is like a DSL for doing it right.
If one has experience with the yacc/lex approach, using happy will be a smaller learning curve.
In my opinion Parsec hides most of the nasty grammar details and lets you write your parsers more intuitively. If you want to learn this stuff in the first place go with some parser-generator like Happy (or even try to implement one yourself).
I'm used to the parser combinator library uu-parsinglib from utrecht university. One can have error correcting and permutations for free, and also the things that parsec has. I also like it because my implemented grammar looks like an EBNF grammar, without so much monadic stuff, and is easy to read.
Naive parser combinators do not allow left-recursion in grammar rules and I haven't found a library that does.
Happy does allow full BNF in language spec, and some useful staff like priority rules. So, for complicated cases Happy and parser generators in general are much better. However, in case of simple, stupid languages with LL(k) parseable grammars, I would use a parser combinator library as more maintainer-friendly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With