Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Haskell's Parsec library be used to implement a recursive descent parser with backup?

I've been considering using Haskell's Parsec parsing library to parse a subset of Java as a recursive descent parser as an alternative to more traditional parser-generator solutions like Happy. Parsec seems very easy to use, and parse speed is definitely not a factor for me. I'm wondering, though, if it's possible to implement "backup" with Parsec, a technique which finds the correct production to use by trying each one in turn. For a simple example, consider the very start of the JLS Java grammar:

Literal:
    IntegerLiteral  
    FloatingPointLiteral

I'd like a way to not have to figure out how I should order these two rules to get the parse to succeed. As it stands, a naive implementation like this:

literal = do {
    x <- try (do { v <- integer; return (IntLiteral v)}) <|>
         (do { v <- float; return (FPLiteral v)});
    return(Literal x)
}

Will not work... inputs like "15.2" will cause the integer parser to succeed first, and then the whole thing will choke on the "." symbol. In this case, of course, it's obvious that you can solve the problem by re-ordering the two productions. In the general case, though, finding things like this is going to be a nightmare, and it's very likely that I'll miss some cases. Ideally, I'd like a way to have Parsec figure out stuff like this for me. Is this possible, or am I simply trying to do too much with the library? The Parsec documentation claims that it can "parse context-sensitive, infinite look-ahead grammars", so it seems like something like I should be able to do something here.

like image 671
Derek Thurn Avatar asked Mar 20 '10 14:03

Derek Thurn


People also ask

How do you implement a recursive descent parser?

The major approach of recursive-descent parsing is to relate each non-terminal with a procedure. The objective of each procedure is to read a sequence of input characters that can be produced by the corresponding non-terminal, and return a pointer to the root of the parse tree for the non-terminal.

What is the type of recursive descent parser?

In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar.

How does a recursive descent parser work?

Recursive descent is a top-down parsing technique that constructs the parse tree from the top and the input is read from left to right. It uses procedures for every terminal and non-terminal entity. This parsing technique recursively parses the input to make a parse tree, which may or may not require back-tracking.

Is LL parser recursive-descent?

A form of recursive-descent parsing that does not require any back-tracking is known as predictive parsing. It is also called as LL(1) parsing table technique since we would be building a table for string to be parsed. It has capability to predict which production is to be used to replace input string.


2 Answers

One way you can do this is to use the try combinator, which allows a parser to consume input and fail without failing the whole parse.

Another is to use Text.ParserCombinators.ReadP, which implements a symmetric choice operator, in which it is proven that a +++ b = b +++ a, so it really doesn't matter which order. I'm rather partial to ReadP, since it is minimal but provides what you need to build up a really powerful parser.

like image 60
luqui Avatar answered Oct 22 '22 18:10

luqui


Either use Parsec's notFollowedBy to ensure that integer consumed everything up to some token separator (this approach will scale to arbitrary scenario most of the time), or take a look at parser combinators that explore all possible parsing alternatives. First to come to mind is UU_Parsing library.

like image 38
ADEpt Avatar answered Oct 22 '22 17:10

ADEpt