I'm using Jison (Bison) to create a simple markup language. I'm clearly new to this, but slight variations are working very well. I just don't understand the source of the S/R conflict.
It doesn't seem matter that 'Text' is returned by two lexer actions (with different Start Conditions) and I like this because it seems to allow the grammar to have fewer rules and because the error messages to the user are consistent. I've tried making the 'Text' rule common regardless of context and I've also tried giving each token a different name, but it doesn't seem to have any effect on the S/R Conflicts when it's all together.
The parser is SUPPOSED to create a json-object with plain-text, sub-arrays, and various special nodes.
Specification:
/* lexical grammar */
%lex
%s bracketed
%%
<bracketed>(\\.|[^\\\,\[\]])+ { yytext = yytext.replace(/\\(.)/g, '$1'); return 'Text'; }
<INITIAL>(\\.|[^\\\[])+ { yytext = yytext.replace(/\\(.)/g, '$1'); return 'Text'; }
"[" { this.begin('bracketed'); return '['; }
"]" { this.popState(); return ']'; }
"," return ','
<<EOF>> return 'END'
/lex
%start template
%%
template
: sentence END
;
sentence
: /* empty */
| sentence Text
| sentence '[' ']'
| sentence '[' dynamic ']'
;
dynamic
: sentence
/*| dynamic ',' sentence*/
;
Warnings:
Conflict in grammar: multiple actions possible when lookahead token is ] in state 5
- reduce by rule: sentence ->
- shift token (then go to state 6)
States with conflicts:
State 5
sentence -> sentence [ .] #lookaheads= END Text [ ]
sentence -> sentence [ .dynamic ] #lookaheads= END Text [ ]
dynamic -> .sentence #lookaheads= ]
sentence -> . #lookaheads= ] Text [
sentence -> .sentence Text
sentence -> .sentence [ ]
sentence -> .sentence [ dynamic ]
Different generator algorithms have more or less trouble, but they all seem to have trouble.
Thanks!
The conflict comes fundamentally from these two rules:
sentence: sentence '[' Text ']'
| sentence '[' sentenceList ']'
The reason is that after seeing a sentence
and a [
and looking at the next token being Text
, the parser doesn't known whether to shift the Text
, matching the first rule, or to treat that Text
as the beginning of a sentenceList
going towards matching the second rule.
Now if you had a parser generator that use 2-token lookahead, this wouldn't be a problem, but bison is LALR(1) (the 1 being one token lookahead).
There are a couple of things you could try:
do extra lookahead in the lexer to differentiate Text-followed-by-] from Text-not-followed-by-] as two distinct tokens then rewrite the rules to use both of these tokens.
Use bison's %glr-parser feature to use GLR parser. This will parse the sentence both ways and later throw away the one that doesn't match
refactor the grammar to not need 2-token lookahead.
One refactoring that works in your case would be to rewrite the sentence
rules to make them all right-recursive instead of left-recursive:
sentence: /* empty */
| Text sentence
| '[' ']' sentence
| '[' Text ']' sentence
| '[' sentenceList ']' sentence
;
This avoids having sentence
(or any other rule that starts with sentence
such as sentenceList
) start with a null reduction of the sentence: /*empty*/
rule. So the parser can freely shift a Text
in the problematic case deferring the reduction until it sees the next token. It does have memory use implications, however, as it results in a parser that will essentially shift the entire input on to the parser stack and then reduce it one sentence at a time.
Another refactor you could do would be to subsume the [Text]
and []
constructs into the [sentenceList]
:
sentence: /* empty */
| sentence Text
| sentence '[' sentenceList ']'
;
sentenceList: sentence
| sentenceList ',' sentence
So now a sentenceList
is one or more sentences separated by commas (instead of two or more), and in the action for the sentence '[' sentenceList ']'
rule, you'd check the sentenceList
to see if it was two or more sentences and act appropriately.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With