Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solving parsing conflicts in a tiny Lemon grammar

I'm trying to learn basics of the Lemon parser generator, but I stuck quickly.

Here's a tiny grammar:

%right PLUS_PLUS.
%left DOT.

program ::= expr.

member_expr ::= expr DOT IDENTIFIER.

lhs_expr ::= member_expr.

expr ::= lhs_expr.
expr ::= PLUS_PLUS lhs_expr.

It causes 1 parsing conflict:

State 3:
      (3) expr ::= lhs_expr *
      (4) expr ::= PLUS_PLUS lhs_expr *

                           DOT reduce       3      expr ::= lhs_expr
                           DOT reduce       4       ** Parsing conflict **
                     {default} reduce       4      expr ::= PLUS_PLUS lhs_expr

Whereas, if I rewrite the last rule as follows:

expr ::= PLUS_PLUS expr DOT IDENTIFIER.

Then it causes no conflicts. But I don't think that this is the right way to go.

I'd appreciate if someone could explain what's the right way, and why.

like image 533
Dmitry Frank Avatar asked Jan 06 '17 22:01

Dmitry Frank


1 Answers

So you wrote an ambiguous grammar, which says to accept:

 ++ x . y

with two interpretations:

 [++ x ] . y

and

 ++ [x . y]

where the [ ] are just my way to show to the groupings.

Lemon is an L(AL)R parser, and such parsers simply don't handle ambiguities (multiple interpretations). The reduce-reduce conflict reported is what happens when the parser hits that middle dot; does it group "++ x" as "[++ x] ." or as "++ [ x .]"? Both choices are valid, and it can't choose safely.

If you stick with Lemon (or another LALR parser generator), you have to get rid of the problem by changing the grammar. [You could use a GLR parser generator; it would accept and give you both parses. But all you've done then is to push the problem of resolving the ambiguity to the post-parsing phrase. As you don't want an ambiguity, you might as well avoid it during parsing if you can. In this case I think you can.]

I think you are trying to build a C-like language. So you want something like this:

primitive_target ::= IDENTIFIER ;
primitive_target ::= IDENTIFIER '[' expr ']' ;
access_path ::= primitive_target ;
access_path ::= access_path '.' primitive_target ;

lhs ::= access_path ;
lhs ::= PLUS_PLUS access_path ;
lhs ::= access_path PLUS_PLUS ;

program ::= expr ;

expr ::= term ;
expr ::= expr '+' term ;
term :::= '(' expr ')' ;
term ::= lhs ;
term ::= lhs '=' expr ;
term ::= constant ;
like image 81
Ira Baxter Avatar answered Oct 23 '22 03:10

Ira Baxter