I'm trying to contruct a parser in scala which can parse simple SQL-like strings. I've got the basics working and can parse something like:
select id from users where name = "peter" and age = 30 order by lastname
But now I wondered how to parse nested and classes, i.e.
select name from users where name = "peter" and (age = 29 or age = 30)
The current production of my 'combinedPredicate' looks like this:
def combinedPredicate = predicate ~ ("and"|"or") ~ predicate ^^ {
case l ~ "and" ~ r => And(l,r)
case l ~ "or" ~ r => Or(l,r)
}
I tried recursively referencing the combinedPredicate production within itself but that results in a stackoverflow.
btw, I'm just experimenting here... not implementing the entire ansi-99 spec ;)
Well, recursion has to be delimited somehow. In this case, you could do this:
def combinedPredicate = predicate ~ rep( ("and" | "or" ) ~ predicate )
def predicate = "(" ~ combinedPredicate ~ ")" | simplePredicate
def simplePredicate = ...
So it will never stack overflow because, to recurse, it first has to accept a character. This is the important part -- if you always ensure recursion won't happen without first accepting a character, you'll never get into an infinite recursion. Unless, of course, you have infinite input. :-)
The stack overflow you're experiencing is probably the result of a left-recursive language:
def combinedPredicate = predicate ~ ...
def predicate = combinedPrediacate | ...
The parser combinators in Scala 2.7 are recursive descent parsers. Recursive descent parsers have problems with these because they have no idea how the terminal symbol is when they first encounter it. Other kinds of parsers like Scala 2.8's packrat parser combinators will have no problem with this, though you'll need to define the grammar using lazy val
s instead of methods, like so:
lazy val combinedPredicate = predicate ~ ...
lazy val predicate = combinedPrediacate | ...
Alternatively, you could refactor the language to avoid left recursion. From the example you're giving me, requiring parentheses in this language could solve the problem effectively.
def combinedPredicate = predicate ~ ...
def predicate = "(" ~> combinedPrediacate <~ ")" | ...
Now each deeper level of recursion corresponds to another parentheses parsed. You know you don't have to recurse deeper when you run out of parentheses.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With