Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala Parser Combinators tricks for recursive bnf?

Im trying to match this syntax:

pgm ::= exprs
exprs ::= expr [; exprs]
expr ::= ID | expr . [0-9]+

My scala packrat parser combinator looks like this:

import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._

object Dotter extends StandardTokenParsers with PackratParsers {
    lexical.delimiters ++= List(".",";")
    def pgm = repsep(expr,";")
    def expr :Parser[Any]= ident | expr~"."~num
    def num = numericLit

       def parse(input: String) =
    phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
      case Success(result, _) => println("Success!"); Some(result)
      case n @ _ => println(n);println("bla"); None
    }  

    def main(args: Array[String]) {
      val prg = "x.1.2.3;" +
            "y.4.1.1;" +
            "z;" +
            "n.1.10.30"


            parse(prg);
    }
}

But this doesnt work. Either it "matches greedy" and tells me:

[1.2] failure: end of input expected 
x.1.2.3;y.4.1.1;z;n.1.10.30

or if I change the | to a ||| I get a stackoverflow:

Exception in thread "main" java.lang.StackOverflowError
at java.lang.Character.isLetter(Unknown Source)
at java.lang.Character.isLetter(Unknown Source)
at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32)
at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32)
...

I kindoff understand why I get the errors; what can I do to parse a syntax like the above? It doesnt seem that esoteric to me

EDIT: Based on the paper referenced in http://scala-programming-language.1934581.n4.nabble.com/Packrat-parser-guidance-td1956908.html I found out that my program didnt actually use the new packrat parser.

Ie. change Parser[Any] to PackratParser[Any] and use lazy val instead of def

I rewrote the above to this:

import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._

object Dotter extends StandardTokenParsers with PackratParsers {
    lexical.delimiters ++= List(".",";")
    lazy val pgm : PackratParser[Any] = repsep(expr,";")
    lazy val expr :PackratParser[Any]= expr~"."~num | ident
    lazy val num = numericLit

    def parse(input: String) =
    phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
      case Success(result, _) => println("Success!"); Some(result)
      case n @ _ => println(n);println("bla"); None
    }  

    def main(args: Array[String]) {
      val prg = "x.1.2.3 ;" +
            "y.4.1.1;" +
            "z;" +
            "n.1.10.30"


            parse(prg);
    }
}
like image 225
svrist Avatar asked Jul 27 '10 12:07

svrist


2 Answers

The problem is (at least partially) that you're not actually using Packrat parsers. See the documentation for Scala's PackratParsers trait, which says

Using PackratParsers is very similar to using Parsers:

  • any class/trait that extends Parsers (directly or through a subclass) can mix in PackratParsers. Example: object MyGrammar extends StandardTokenParsers with PackratParsers
  • each grammar production previously declared as a def without formal parameters becomes a lazy val, and its type is changed from Parser[Elem] to PackratParser[Elem]. So, for example, def production: Parser[Int] = {...} becomes lazy val production: PackratParser[Int] = {...}
  • Important: using PackratParsers is not an all or nothing decision. They can be free mixed with regular Parsers in a single grammar.

I don't know enough about Scala 2.8's parser combinators to fix this entirely, but with the following modifications, I was able to get it to parse as far as the semicolon, which is an improvement over what you've accomplished.

object Dotter extends StandardTokenParsers with PackratParsers {
    lexical.delimiters ++= List(".",";")
    lazy val pgm:PackratParser[Any] = repsep(expr,";")
    lazy val expr:PackratParser[Any]= ident ||| (expr~"."~numericLit)

    def parse(input: String) = phrase(expr)(lex(input)) match {
      case Success(result, _) => println("Success!"); Some(result)
      case n @ _ => println(n);println("bla"); None
    }  

    def lex(input:String) = new PackratReader(new lexical.Scanner(input))
}
like image 118
Ken Bloom Avatar answered Oct 01 '22 02:10

Ken Bloom


The production

expr ::= ID | expr . [0-9]+

is left recursive. It expands to

expr ::= ID
expr ::= expr . [0-9]+

where the left recursion occurs on the 2nd line. This is what causes the parser to overflow the stack.

You should rewrite your grammar avoiding left recursive productions.

expr ::= ID {. [0-9]+}
like image 39
michid Avatar answered Oct 01 '22 01:10

michid