Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the yacc/bison LALR(1) algorithm treat "empty" rules?

Tags:

parsing

yacc

lalr

In a LALR(1) parser, the rules in the grammar are converted into a parse table that effectively says "If you have this input so far, and the lookahead token is X, then shift to state Y, or reduce by rule R".

I have successfully constructed a LALR(1) parser in an interpreted language (ruby), not using a generator, but computing a parse table at runtime and evaluating the input using that parse table. This works surprisingly well and the table generation is quite trivial (which surprised me somewhat), supporting self-referential rules and left/right association.

One thing I am having some difficulty to understand, however, is how yacc/bison conceptually processes empty rule definitions. My parser can't handle them, since in generating the table it looks at each symbol in each rule, recursively, and "empty" is just not something that will come from the lexer, nor be reduced by a rule. So how then, do LALR(1) parsers process the empty rule? Do they treat it specially, or is it a "natural" concept that a valid algorithm should just work with, without even needing to have particular awareness of such a concept?

Let's say, a rule that can match any number of paired parentheses with nothing in the middle:

expr:   /* empty */
      | '(' expr ')'
      ;

Input like the following would match this rule:

((((()))))

This means that upon reading '(' and seeing ')' in the lookahead token, the parser choices:

  1. Shift the ')' (not possible)
  2. Reduce the input according to some other rule (not possible)
  3. Something else...

don't quite fit into the core algorithm of "shift" or "reduce". The parser effectively needs to shift nothing onto the stack, reduce "nothing" to expr, then shift the next token ')', giving '(' expr ')', which of course reduces to expr, and so on.

It's the "shift nothing" that's confusing me. How does the parse table convey such a concept? Consider also that it should be possible to invoke some semantic action that returns a value to $$ on reducing the empty value, so a rather simplistic view of just skipping that from the parse table and saying that '(' on the stack and ')' in the lookahead should simply translate to a shift, would not genuinely produce the sequence '(' expr ')', but would simply produce the sequence '(' ')'.

like image 529
d11wtq Avatar asked Nov 23 '11 12:11

d11wtq


People also ask

How does yacc work?

The input to yacc describes the rules of a grammar. yacc uses these rules to produce the source code for a program that parses the grammar. You can then compile this source code to obtain a program that reads input, parses it according to the grammar, and takes action based on the result.

What is bison programming?

Bison is a general-purpose parser generator that converts an annotated context-free grammar into a deterministic LR or generalized LR (GLR) parser employing LALR (1) parser tables. As an experimental feature, Bison can also generate IELR (1) or canonical LR(1) parser tables.

Is Bison context-free grammar?

Bison takes as input a context-free grammar specification and produces a C-language function that recognizes correct instances of the grammar. The Bison grammar input file conventionally has a name ending in `. y' . Grammar Outline: Overall layout of the grammar file.

What kind of parser is Bison?

Bison is a general-purpose parser generator that converts a grammar description (Bison Grammar Files) for an LALR(1) context-free grammar into a C program to parse that grammar. The Bison parser is a bottom-up parser.


1 Answers

Despite thinking about this for days, since thinking this through when writing the question and in the minutes that followed, something just hit me as so incredibly obvious and simple.

Reduction by all rules is always: pop off X inputs from the stack, where X is the number of components in the rule, then shift the result back onto the stack and goto whatever state is given in the table after that reduction.

In the case of the empty rule, you don't need to consider that "empty" is even a concept. The parse table simply needs to include a transition that says "given '(' on the stack and 'anything that is not '(' in the lookahead, reduce by the 'empty' rule". Now since the empty rule has a size of zero components, popping zero from the stack means the stack doesn't change, then when the result of reducing nothing is shifted onto the stack, you're looking at something that does indeed appear in the grammar and everything becomes clear.

Stack       Lookahead    Remaining Input      Action
--------------------------------------------------------------
$           (            ())$                 Shift '('
$(          (            ))$                  Shift '('
$((         )            )$                   Reduce by /* empty */
$((expr     )            )$                   Shift ')'
$((expr)    )            $                    Reduce by '(' expr ')'
$(expr      )            $                    Shift ')'
$(expr)     $                                 Reduce by '(' expr ')'
$expr                                         Accept

The reason it "just works" is because in order to reduce by the empty rule you only have to pop zero items from the stack.

like image 54
d11wtq Avatar answered Sep 20 '22 06:09

d11wtq