I've been considering using Haskell's Parsec parsing library to parse a subset of Java as a recursive descent parser as an alternative to more traditional parser-generator solutions like Happy. Parsec seems very easy to use, and parse speed is definitely not a factor for me. I'm wondering, though, if it's possible to implement "backup" with Parsec, a technique which finds the correct production to use by trying each one in turn. For a simple example, consider the very start of the JLS Java grammar: <pre class="prettyprint"><code>Literal: IntegerLiteral FloatingPointLiteral </code></pre> I'd like a way to not have to figure out how I should order these two rules to get the parse to succeed. As it stands, a naive implementation like this: <pre class="prettyprint"><code>literal = do { x <- try (do { v <- integer; return (IntLiteral v)}) <|> (do { v <- float; return (FPLiteral v)}); return(Literal x) } </code></pre> Will not work... inputs like "15.2" will cause the integer parser to succeed first, and then the whole thing will choke on the "." symbol. In this case, of course, it's obvious that you can solve the problem by re-ordering the two productions. In the general case, though, finding things like this is going to be a nightmare, and it's very likely that I'll miss some cases. Ideally, I'd like a way to have Parsec figure out stuff like this for me. Is this possible, or am I simply trying to do too much with the library? The Parsec documentation claims that it can "parse context-sensitive, infinite look-ahead grammars", so it seems like something like I should be able to do something here.

One way you can do this is to use the <code>try</code> combinator, which allows a parser to consume input and fail without failing the whole parse. Another is to use <code>Text.ParserCombinators.ReadP</code>, which implements a symmetric choice operator, in which it is proven that <code>a +++ b = b +++ a</code>, so it really doesn't matter which order. I'm rather partial to <code>ReadP</code>, since it is minimal but provides what you need to build up a really powerful parser.

Can Haskell's Parsec library be used to implement a recursive descent parser with backup?

Tags:

computer-science

parsing

haskell

theory

parsec

I've been considering using Haskell's Parsec parsing library to parse a subset of Java as a recursive descent parser as an alternative to more traditional parser-generator solutions like Happy. Parsec seems very easy to use, and parse speed is definitely not a factor for me. I'm wondering, though, if it's possible to implement "backup" with Parsec, a technique which finds the correct production to use by trying each one in turn. For a simple example, consider the very start of the JLS Java grammar:

Literal:
    IntegerLiteral  
    FloatingPointLiteral

I'd like a way to not have to figure out how I should order these two rules to get the parse to succeed. As it stands, a naive implementation like this:

literal = do {
    x <- try (do { v <- integer; return (IntLiteral v)}) <|>
         (do { v <- float; return (FPLiteral v)});
    return(Literal x)
}

Will not work... inputs like "15.2" will cause the integer parser to succeed first, and then the whole thing will choke on the "." symbol. In this case, of course, it's obvious that you can solve the problem by re-ordering the two productions. In the general case, though, finding things like this is going to be a nightmare, and it's very likely that I'll miss some cases. Ideally, I'd like a way to have Parsec figure out stuff like this for me. Is this possible, or am I simply trying to do too much with the library? The Parsec documentation claims that it can "parse context-sensitive, infinite look-ahead grammars", so it seems like something like I should be able to do something here.

671

asked Mar 20 '10 14:03

Derek Thurn

2 Answers

One way you can do this is to use the try combinator, which allows a parser to consume input and fail without failing the whole parse.

Another is to use Text.ParserCombinators.ReadP, which implements a symmetric choice operator, in which it is proven that a +++ b = b +++ a, so it really doesn't matter which order. I'm rather partial to ReadP, since it is minimal but provides what you need to build up a really powerful parser.

answered Oct 22 '22 18:10

luqui

Either use Parsec's notFollowedBy to ensure that integer consumed everything up to some token separator (this approach will scale to arbitrary scenario most of the time), or take a look at parser combinators that explore all possible parsing alternatives. First to come to mind is UU_Parsing library.

answered Oct 22 '22 17:10

ADEpt

Related questions
                            
                                Parse REST API: Having the channel name, can I get the device type before actually sending to Push?
                            
                                Parsing multiple sentences with MaltParser using NLTK
                            
                                In Nokogiri, how do I find all the nodes that come before a certain node in my document?
                            
                                Equation parser efficiency
                            
                                How are textual data files parsed in modern C++?
                            
                                Abstracting Parse objects?
                            
                                IP Address Parser in Javascript
                            
                                Ruby left vs right recursion
                            
                                Parser for the Mathematica syntax?
                            
                                What is the easiest way of telling whether a BNF grammar is ambiguous or not?
                            
                                Custom interpreter for mathematical expressions
                            
                                How to parse and simplify a string like '3cm/µs² + 4e-4 sqmiles/km/h**2' treating physical units correctly?
                            
                                Given a regular expression, how would I generate all strings that match it?
                            
                                Switch gettext translated language with original language
                            
                                Where should I draw the line between lexer and parser?
                            
                                SQL Parsing library for Python [duplicate]
                            
                                How to parse a DOT file in Python
                            
                                Running bash commands for each JSON item through jq
                            
                                C#/.NET Lexer Generators
                            
                                How does a Java compiler parse typecasts?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With