Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parser library that can handle ambiguity

I'm looking for a mature parser library, either for Scala or Haskell. The most important point is, that the library can handle ambiguity. If an expression is ambiguous, I want every possible abstract syntax tree, that matches the expression. Simple example: The expression a ⊗ b ⊗ c can be seen as (a ⊗ b) ⊗ c or a ⊗ (b ⊗ c), and I need both variants. Thanks!

like image 252
schlicht Avatar asked Nov 07 '12 22:11

schlicht


3 Answers

I feel like the old guy for remembering when Walder's papers like Comprehending Monads (the precursor to the do notation) were exciting and new. The idea is that you (to quote) replace a failure by a list of successes, meaning maintain a list of all the possible parses. At the end you normally just take the first match, but with this setup, you can take all of them.

These aren't all that efficient for a deterministic parser, which is why they're less in fashion, but they are what you need.

Have a look at polyparse, and in particular Text.ParserCombinators.HuttonMeijer and Text.ParserCombinators.HuttonMeijerWallace.

(Hutton & Meijer translated the parser library to Haskell (from Gofer) and Wallace added extra features.)

Make sure you check it out on simple cases like parsing "aaaa" with

testP = do
   a <- many $ char 'a'
   b <- many $ char 'a'
   return (a,b)

to see if it has the semantics you seek.

You asked for mature. These libraries are part of pure functional programming's heritage! Having said that, I'd call parsec more mature, even though it's younger.

(Speculation: I don't think parsec can do what you want. Its standard choice combinator is deterministic. I haven't looked into tweaking or replacing that behaviour, and I wouldn't want to I'm afraid.)

like image 185
AndrewC Avatar answered Oct 31 '22 00:10

AndrewC


This question immediately reminded me of the Yacc is dead / No, it's not debate from the end of 2010. The authors of the Yacc is dead paper provide a library in Scala (unmaintained), Haskell and Racket. In the Yacc is alive response, Russ Cox points out that the code runs in exponential time for ambiguous grammars.

It's well-known that it is possible to parse ambiguous grammars in O(n^3), although obviously it can take exponential time to enumerate all the parse trees in the case that there are exponentially many of them -- and there will be in the case of x1 + x2 + x3 ... + xn. bison implements the GLR algorithm which does so; unfortunately, while bison is certainly mature (if not actually moribund), it is written neither in Haskell nor in Scala.

Daniel Spiewak implemented a GLL parser in Scala IIRC, but last time I looked at it, it suffered from some performance issues. So I'm not sure that it could be described as mature, either.

like image 4
rici Avatar answered Oct 30 '22 23:10

rici


I can't speak to how mature it is or give you any usage examples, but I've had the scala gll-combinators library open in a tab for a few days. It handles ambiguous grammars and looks pretty nifty.

like image 3
jberryman Avatar answered Oct 31 '22 01:10

jberryman