Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala - Combinator Parsing, order of alternatives seems to matter

Tags:

parsing

scala

I am relatively new to Scala and I'm trying to get a grasp of combinator parsing. So I have some code that parses logical atoms (predicates in first-order logic) like p(x,y,z) or p(x,1,q(a,b,c),z). I have this little piece of code:

class Logic extends JavaTokenParsers {
    def functor: Parser[Any] = ident
    def term: Parser[Any] = predicate | ident | floatingPointNumber
    def predicate: Parser[Any] = functor~"("~repsep(term,",")~")"  
}

Functor is the predicate symbol, like p in p(x,y,z), term is a constant or variable like 1 or x, or a predicate, and a predicate is a compound term like p(x,y,z). This code works fine as is, but has problems if I change the order with which I declare the alternatives in the term parser. That is, if I write the term parser like

def term: Parser[Any] = ident | floatingPointNumber | predicate

(predicate is the last alternative here, while it was first previously), then it fails parsing "nested" expressions like p(x,1,q(a,b,c),z) (while it still works for "flat" expressions like p(x,y,z)). Could someone please indicate what I am missing here?

Thanks a lot

like image 535
nkatz Avatar asked Jan 08 '23 19:01

nkatz


2 Answers

Order Matters

The reason is A | B will try to satisfy A first and B is tried only if A fails.

This is different from | in regular expressions (or context-free grammars in general).

In this case in "p(x,1,q(a,b,c),z)" p satisfies ident so other alternatives won't be matched (and there is no backtracking).

For this reason, amongst alternatives with clashing prefixes, you can put the one accepting longer strings first (or the one you know that will result in satisfying more phrases).

Longest Alternative Matching

Note that you can use ||| combinator:

def term: Parser[Any] = ident ||| floatingPointNumber ||| predicate

According to the Scala docs, ||| is a parser combinator matching alternatives with longest match composition.

like image 159
Nader Ghanbari Avatar answered Jan 30 '23 13:01

Nader Ghanbari


I'm going assume you try to parse via logic.parse(logic.term, "p(x,1,q(a,b,c),z)").

In fact, the order of alternatives does matter, as the parser will choose the first alternative to match the given input. In your case, p matches the ident definition. The parser will then return a ParseResult containing the match (type Any in your case) and the rest of the input, which is (x,1,q(a,b,c),z) in your case.

like image 30
Conrad Avatar answered Jan 30 '23 12:01

Conrad