Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parsing recursive structures in scala

I'm trying to contruct a parser in scala which can parse simple SQL-like strings. I've got the basics working and can parse something like:

select id from users where name = "peter" and age = 30 order by lastname

But now I wondered how to parse nested and classes, i.e.

select name from users where name = "peter" and (age = 29 or age = 30)

The current production of my 'combinedPredicate' looks like this:

def combinedPredicate = predicate ~ ("and"|"or") ~ predicate ^^ {
  case l ~ "and" ~ r => And(l,r)
  case l ~ "or" ~ r => Or(l,r)
}

I tried recursively referencing the combinedPredicate production within itself but that results in a stackoverflow.

btw, I'm just experimenting here... not implementing the entire ansi-99 spec ;)

like image 849
p3t0r Avatar asked Nov 26 '09 08:11

p3t0r


2 Answers

Well, recursion has to be delimited somehow. In this case, you could do this:

def combinedPredicate = predicate ~ rep( ("and" | "or" ) ~ predicate )
def predicate = "(" ~ combinedPredicate ~ ")" | simplePredicate
def simplePredicate = ...

So it will never stack overflow because, to recurse, it first has to accept a character. This is the important part -- if you always ensure recursion won't happen without first accepting a character, you'll never get into an infinite recursion. Unless, of course, you have infinite input. :-)

like image 167
Daniel C. Sobral Avatar answered Oct 27 '22 01:10

Daniel C. Sobral


The stack overflow you're experiencing is probably the result of a left-recursive language:

def combinedPredicate = predicate ~ ...
def predicate = combinedPrediacate | ...

The parser combinators in Scala 2.7 are recursive descent parsers. Recursive descent parsers have problems with these because they have no idea how the terminal symbol is when they first encounter it. Other kinds of parsers like Scala 2.8's packrat parser combinators will have no problem with this, though you'll need to define the grammar using lazy vals instead of methods, like so:

lazy val combinedPredicate = predicate ~ ...
lazy val predicate = combinedPrediacate | ...

Alternatively, you could refactor the language to avoid left recursion. From the example you're giving me, requiring parentheses in this language could solve the problem effectively.

def combinedPredicate = predicate ~ ...
def predicate = "(" ~> combinedPrediacate <~ ")" | ...

Now each deeper level of recursion corresponds to another parentheses parsed. You know you don't have to recurse deeper when you run out of parentheses.

like image 39
Ken Bloom Avatar answered Oct 27 '22 03:10

Ken Bloom