Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsec vs Yacc/Bison/Antlr: Why and when to use Parsec?

Tags:

haskell

parsec

I'm new to Haskell and Parsec. After reading Chapter 16 Using Parsec of Real World Haskell, a question appeared in my mind: Why and when is Parsec better than other parser generators like Yacc/Bison/Antlr?

My understanding is that Parsec creates a nice DSL of writing parsers and Haskell makes it very easy and expressive. But parsing is such a standard/popular technology that deserves its own language, which outputs to multiple target languages. So when shall we use Parsec instead of, say, generating Haskell code from Bison/Antlr?

This question might go a little beyond technology, and into the realm of industry practice. When writing a parser from scratch, what's the benefit of picking up Haskell/Parsec compared to Bison/Antlr or something similar?

BTW: my question is quite similar to this one but wasn't answered satisfactorily there.

like image 649
Ning Wang Avatar asked Feb 20 '11 05:02

Ning Wang


People also ask

What is yacc and bison?

Bison originated as a workalike of a program called Yacc — Yet Another Compiler Compiler. 9. Yacc was written at Bell Labs as part of the very early development of Unix; one of its first uses was to develop the original Portable C Compiler, pcc. The same person, Steven C. Johnson, wrote Yacc and the original pcc.

Do people still use YACC?

Yes, this stuff is still used.


2 Answers

One of the main differences between the tools you listed, is that ANTLR, Bison and their friends are parser generators, whereas Parsec is a parser combinator library.

A parser generator reads in a description of a grammar and spits out a parser. It is generally not possible to combine existing grammars into a new grammar, and it is certainly not possible to combine two existing generated parsers into a new parser.

A parser combinator OTOH does nothing but combine existing parsers into new parsers. Usually, a parser combinator library ships with a couple of trivial built-in parsers that can parse the empty string or a single character, and it ships with a set of combinators that take 1 or more parsers and return a new one that, for example, parses the sequence of the original parsers (e.g. you can combine a d parser and an o parser to form a do parser), the alternation of the original parsers (e.g. a 0 parser and a 1 parser to a 0|1 parser) or parses the original parse multiple times (repetetion).

What this means is that you could, for example, take an existing parser for Java and an existing parser for HTML and combine them into a parser for JSP.

Most parser generators don't support this, or only support it in a limited way. Parser combinators OTOH only support this and nothing else.

like image 63
Jörg W Mittag Avatar answered Sep 29 '22 08:09

Jörg W Mittag


You might want to see this question as well as the linked one in your question.

Which Haskell parsing technology is most pleasant to use, and why?

In Haskell the competition is between Parsec (and other parser combinators) and the parser generator Happy. I'd pick Happy if I already had an LR grammar to work from - parser combinators take grammars in LL form and the translation from LR to LL takes some effort and a combinator parser will usually be significantly slower. If I don't have a grammar I'll use Parsec, it is more flexible (powerful) than Happy and its more fun to work "in Haskell" than generate code with Happy and Alex. If you use Happy for parsing you almost always need to use Alex for lexing.

For industry practice, it would be odd to decide to use Haskell just to get Parsec. For parsing, most of the current crop of languages will have at least a parser generator and probably something more flexible like a port of Parsec or a PEG system.

Ira Baxter's answer to the linked question was spot-on about a parser getting you merely to the foothold of the Himalayas for writing a translator, but being part of a translator is only one of the uses for a parser, so there are still many domains where fairly minimalist systems like ANTLR, Happy and Parsec are satisfactory.

like image 22
stephen tetley Avatar answered Sep 29 '22 09:09

stephen tetley