Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing unordered sequence with parsing expression grammar

Is there a (simple) way, within a parsing expression grammar (PEG), to express an "unordered sequence"? A rule such as

Rule <- A B C

requires A, B and C to match in order. A rule such as

Rule <- (A B C) / (B C A) / (C A B) / (A C B) / (C B A) / (B A C)

allows them to match in any order (which is what we want) but it is cumbersome and inapplicable in practice with more terms in the sequence.

Is the only solution to use a syntactically looser rule such as

Rule <- (A / B / C){3}

and semantically check that each rule matches only once?

The fact that, e.g., Relax NG Compact Syntax has an "unordered list" operator to parse XML make me hint that there is no obvious solution.

Last question: do you think the addition of such an operator would bring ambiguity to PEG?

like image 270
migou Avatar asked Nov 05 '22 18:11

migou


1 Answers

Grammar rules express precisely the sequence of forms that you want, regardless of parsing engine (e.g., PEG, LALR, LL(k), ...) that you choose.

The only way to express that you want all possible sequences of just of something using BNF rules is the big ugly rule you proposed.

The standard solution is to simply define:

rule <- (A | B | C)* 

(or whatever syntax your parser generator accepts for lists) and semantically count that only 3 forms are provided and they are unique.

Often people building parser generators add special "extended BNF" notations to let them describe special circumstances; you gave an example use {3} as special syntax implying that you only wanted "3 of" under the assumption the parser generator accepts this notation and does the appropriate enforcement. One can imagine an extension notation {unique} to let you describe your situation. I've never seen a parser generator that implemented that idea.

like image 77
Ira Baxter Avatar answered Nov 14 '22 23:11

Ira Baxter