I am relatively new to Scala and I'm trying to get a grasp of combinator parsing. So I have some code that parses logical atoms (predicates in first-order logic) like p(x,y,z)
or p(x,1,q(a,b,c),z)
. I have this little piece of code:
class Logic extends JavaTokenParsers {
def functor: Parser[Any] = ident
def term: Parser[Any] = predicate | ident | floatingPointNumber
def predicate: Parser[Any] = functor~"("~repsep(term,",")~")"
}
Functor
is the predicate symbol, like p
in p(x,y,z)
, term
is a constant or variable like 1
or x
, or a predicate
, and a predicate
is a compound term like p(x,y,z)
. This code works fine as is, but has problems if I change the order with which I declare the alternatives in the term
parser. That is, if I write the term
parser like
def term: Parser[Any] = ident | floatingPointNumber | predicate
(predicate
is the last alternative here, while it was first previously), then it fails parsing "nested" expressions like p(x,1,q(a,b,c),z)
(while it still works for "flat" expressions like p(x,y,z)
). Could someone please indicate what I am missing here?
Thanks a lot
The reason is A | B
will try to satisfy A
first and B
is tried only if A
fails.
This is different from |
in regular expressions (or context-free grammars in general).
In this case in "p(x,1,q(a,b,c),z)"
p
satisfies ident
so other alternatives won't be matched (and there is no backtracking).
For this reason, amongst alternatives with clashing prefixes, you can put the one accepting longer strings first (or the one you know that will result in satisfying more phrases).
Note that you can use |||
combinator:
def term: Parser[Any] = ident ||| floatingPointNumber ||| predicate
According to the Scala docs, |||
is a parser combinator matching alternatives with longest match composition.
I'm going assume you try to parse via logic.parse(logic.term, "p(x,1,q(a,b,c),z)")
.
In fact, the order of alternatives does matter, as the parser will choose the first alternative to match the given input. In your case, p
matches the ident
definition. The parser will then return a ParseResult containing the match (type Any
in your case) and the rest of the input, which is (x,1,q(a,b,c),z)
in your case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With