I've noticed a distinct lack of LL parsers that create parsers in functional languages. The ideal find for what I've been looking for without success is something to generate a Haskell parser for an ANTLR-style LL(*) grammar (modulo minor reformatting of the grammar), and was surprised that every last parser generator with a functional language target I found was some kind of LR parser.
I want to transition the parser of this language I'm working on which has functional features from ANTLR to self-host in the language itself, and it would help a lot if I could port to my language something almost surely correct in another functional language (preferably ones I'm familiar with, Haskell and Scala), instead of having to rewrite it entirely from scratch, though in the end I might do this, since the core language is small.
At this point more than even a solution to this is I'm very curious as to why there are no such LL(*) or even LL(k) parser generators, but many LR generators, since LL seems inherently easier.
Java Compiler Compiler (JavaCC) is the most popular parser generator for use with Java applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.
In a functional language such as Haskell, parsers can naturally be viewed as functions. type Parser = String Tree. A parser is a function that takes a string and returns some form of tree.
V.B. For many grammars, the LR parsing tables can be generated automatically from the grammar. One of the most popular software systems that does this is available in the Unix programming environment; it is called yacc (yet another compiler-compiler).
The major reason for this is that most LL(k) parsers that are written in functional languages are just implemented using parser combinators, because the easiest path to generate a parser combinator library is recursive descent.
Haskell's parsec, attoparsec, and polyparse and Scala's stock parser combinators all produce what are effectively LL(*) parsers.
Both parsec and attoparsec require you to use an explicit try combinator to get backtracking, but this is only done for efficiency and the scala parser combinators can also deal with packrat parsing.
Consider the following fragment from the announcement of Brent Yorgey's recent unbound package:
parseAtom = parens parseTerm
<|> var <$> ident
<|> lam <$> brackets ident <*> parseTerm
it is pretty easy to see the original grammar.
LR parsers require much more complicated preprocessing to generate the tables to execute efficiently, since the direct hand encoding of one using something like recursive ascent is pretty awful.
By implementing your parser combinators as an EDSL rather than an external tool you enable greater use of advanced features of your programming language. You can make portions of the grammar higher order, build the lexer hack directly into the parser, etc. Typical LR parser generators can't do these things, or can only offer them in ad hoc ways in limited contexts because of the need to be able to emit the tables in the end.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With