Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Debugging in Jison

I'm using Jison to write a parser. This is my grammar:

{
    "program": [
        ["statements EOF", "return $1;"]
    ],
    "statements": [
        ["statement",            "$$ = $1;"],
        ["statements statement", "$$ = $1 + '\\n' + $2;"]
    ],
    "statement": [
        ["expression NEWLINE", "$$ = $1 + ';';"]
    ],
    "expression": [
        ["NUMBER",                "$$ = yytext;"],
        ["expression expression", "$$ = $1 + ', ' + $2;"]
    ]
}

When I run it however I get the following error message:

Conflict in grammar: multiple actions possible when lookahead token is NUMBER in
state 9
- reduce by rule: expression -> expression expression
- shift token (then go to state 5)

States with conflicts:
State 9
  expression -> expression expression . #lookaheads= NEWLINE NUMBER
  expression -> expression .expression
  expression -> .NUMBER
  expression -> .expression expression

What am I supposed to make of this debug message? How would you explain this message in simple English? What does the period in expression -> expression expression . mean? What are .expression and .NUMBER? How are they different from expression and NUMBER respectively?

like image 366
Aadit M Shah Avatar asked Dec 27 '22 06:12

Aadit M Shah


1 Answers

What am I supposed to make of this debug message?

A grammar conflict means the parser can reach a state where it could follow multiple rules, but it doesn't have enough information to determine which one to follow (or worse, the grammar is ambiguous). You'll have to tweak the grammar to eliminate the conflicts. Often, this just takes practice to get right.

What does the period in expression -> expression expression . mean?

The period represents the position of the parser. So, in that rule, the parser would have just parsed two expressions, and is now in state 9. When the period is at the end of the rule, that means the rule can be "reduced", and grouped into a single expression non-terminal in this case. However, it can only reduce if the next token (the lookahead) is a NEWLINE or a NUMBER.

In expression -> .NUMBER, the parser has just encountered a NUMBER token, which it can "shift", then move to a new state.

The conflict occurs because the parser can reduce or shift when it encounters a NUMBER token.

Edit: To resolve your conflict, we need to split that expression rule into distinct non-terminals. Having the same non-terminal in sequence is bound to produce conflicts.

e.g.

{
    "program": [
        ["statements EOF", "return $1;"]
    ],
    "statements": [
        ["statement",            "$$ = $1;"],
        ["statements statement", "$$ = $1 + '\\n' + $2;"]
    ],
    "statement": [
        ["expression NEWLINE", "$$ = $1 + ';';"]
    ],
    "expression": [
        ["expression expression_base", "$$ = $1 + ', ' + $2;"],
        ["expression_base", "$$ = $1;"]
    ],
    "expression_base": [
        ["NUMBER",                "$$ = yytext;"]
    ]
}

Here's a nice resource for more background on these types of grammars.

like image 109
user2247701 Avatar answered Jan 05 '23 23:01

user2247701