I'm looking for a mature parser library, either for Scala or Haskell. The most important point is, that the library can handle ambiguity. If an expression is ambiguous, I want every possible abstract syntax tree, that matches the expression. Simple example: The expression a ⊗ b ⊗ c can be seen as (a ⊗ b) ⊗ c or a ⊗ (b ⊗ c), and I need both variants. Thanks!
I feel like the old guy for remembering when Walder's papers like Comprehending Monads (the precursor to the do notation) were exciting and new. The idea is that you (to quote) replace a failure by a list of successes, meaning maintain a list of all the possible parses. At the end you normally just take the first match, but with this setup, you can take all of them.
These aren't all that efficient for a deterministic parser, which is why they're less in fashion, but they are what you need.
Have a look at polyparse, and in particular Text.ParserCombinators.HuttonMeijer
and Text.ParserCombinators.HuttonMeijerWallace
.
(Hutton & Meijer translated the parser library to Haskell (from Gofer) and Wallace added extra features.)
Make sure you check it out on simple cases like parsing "aaaa"
with
testP = do
a <- many $ char 'a'
b <- many $ char 'a'
return (a,b)
to see if it has the semantics you seek.
You asked for mature. These libraries are part of pure functional programming's heritage! Having said that, I'd call parsec more mature, even though it's younger.
(Speculation: I don't think parsec can do what you want. Its standard choice combinator is deterministic. I haven't looked into tweaking or replacing that behaviour, and I wouldn't want to I'm afraid.)
This question immediately reminded me of the Yacc is dead / No, it's not debate from the end of 2010. The authors of the Yacc is dead paper provide a library in Scala (unmaintained), Haskell and Racket. In the Yacc is alive response, Russ Cox points out that the code runs in exponential time for ambiguous grammars.
It's well-known that it is possible to parse ambiguous grammars in O(n^3)
, although obviously it can take exponential time to enumerate all the parse trees in the case that there are exponentially many of them -- and there will be in the case of x1 + x2 + x3 ... + xn
. bison
implements the GLR algorithm which does so; unfortunately, while bison
is certainly mature (if not actually moribund), it is written neither in Haskell nor in Scala.
Daniel Spiewak implemented a GLL parser in Scala IIRC, but last time I looked at it, it suffered from some performance issues. So I'm not sure that it could be described as mature, either.
I can't speak to how mature it is or give you any usage examples, but I've had the scala gll-combinators library open in a tab for a few days. It handles ambiguous grammars and looks pretty nifty.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With