How does the yacc/bison LALR(1) algorithm treat "empty" rules?

Tags:

In a LALR(1) parser, the rules in the grammar are converted into a parse table that effectively says "If you have this input so far, and the lookahead token is X, then shift to state Y, or reduce by rule R".

I have successfully constructed a LALR(1) parser in an interpreted language (ruby), not using a generator, but computing a parse table at runtime and evaluating the input using that parse table. This works surprisingly well and the table generation is quite trivial (which surprised me somewhat), supporting self-referential rules and left/right association.

One thing I am having some difficulty to understand, however, is how yacc/bison conceptually processes empty rule definitions. My parser can't handle them, since in generating the table it looks at each symbol in each rule, recursively, and "empty" is just not something that will come from the lexer, nor be reduced by a rule. So how then, do LALR(1) parsers process the empty rule? Do they treat it specially, or is it a "natural" concept that a valid algorithm should just work with, without even needing to have particular awareness of such a concept?

Let's say, a rule that can match any number of paired parentheses with nothing in the middle:

expr:   /* empty */
      | '(' expr ')'
      ;

Input like the following would match this rule:

((((()))))

This means that upon reading '(' and seeing ')' in the lookahead token, the parser choices:

Shift the ')' (not possible)
Reduce the input according to some other rule (not possible)
Something else...

don't quite fit into the core algorithm of "shift" or "reduce". The parser effectively needs to shift nothing onto the stack, reduce "nothing" to expr, then shift the next token ')', giving '(' expr ')', which of course reduces to expr, and so on.

It's the "shift nothing" that's confusing me. How does the parse table convey such a concept? Consider also that it should be possible to invoke some semantic action that returns a value to $$ on reducing the empty value, so a rather simplistic view of just skipping that from the parse table and saying that '(' on the stack and ')' in the lookahead should simply translate to a shift, would not genuinely produce the sequence '(' expr ')', but would simply produce the sequence '(' ')'.

529

asked Nov 23 '11 12:11

d11wtq

1 Answers

Despite thinking about this for days, since thinking this through when writing the question and in the minutes that followed, something just hit me as so incredibly obvious and simple.

Reduction by all rules is always: pop off X inputs from the stack, where X is the number of components in the rule, then shift the result back onto the stack and goto whatever state is given in the table after that reduction.

In the case of the empty rule, you don't need to consider that "empty" is even a concept. The parse table simply needs to include a transition that says "given '(' on the stack and 'anything that is not '(' in the lookahead, reduce by the 'empty' rule". Now since the empty rule has a size of zero components, popping zero from the stack means the stack doesn't change, then when the result of reducing nothing is shifted onto the stack, you're looking at something that does indeed appear in the grammar and everything becomes clear.

Stack       Lookahead    Remaining Input      Action
--------------------------------------------------------------
$           (            ())$                 Shift '('
$(          (            ))$                  Shift '('
$((         )            )$                   Reduce by /* empty */
$((expr     )            )$                   Shift ')'
$((expr)    )            $                    Reduce by '(' expr ')'
$(expr      )            $                    Shift ')'
$(expr)     $                                 Reduce by '(' expr ')'
$expr                                         Accept

The reason it "just works" is because in order to reduce by the empty rule you only have to pop zero items from the stack.

answered Sep 20 '22 06:09

d11wtq

Related questions
                            
                                How to keep whitespace before document element when parsing with Java?
                            
                                PHP DOMDocument getting Attribute of Tag
                            
                                Latex - extract substring/ignore characters
                            
                                Correct ReadP usage in Haskell
                            
                                How can I eliminate the 'main' routine of the flex & bison so I can put the lexing and parsing process into a library?
                            
                                malformed JSON string, neither array, object, number, string or atom
                            
                                How can I convert a string containing a character escape sequence into a char?
                            
                                Filehelpers CSV parsing. How to use FieldQuoted attribute?
                            
                                Get comments in a php file
                            
                                Text File data parsing lines and output as columns
                            
                                sscanf() and locales. How does one really parse things like "3.14"?
                            
                                Parse json with variable key
                            
                                Is it possible to have two or more Lex/Yacc parsers in the same application
                            
                                Parsing HTML document: Regular expression or LINQ?
                            
                                How can I merge CSS definitions in files into inline style attributes, using Perl?
                            
                                bindings not resolving with AST processing in eclipse
                            
                                Parsing in Haskell for a simple interpreter
                            
                                Parsing a chemical formula from a string in C#? [duplicate]
                            
                                SAX parser: Ignoring special characters
                            
                                Any build-in JSON.parse in MongoDB?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does the yacc/bison LALR(1) algorithm treat "empty" rules?

Tags:

parsing

yacc

lalr

d11wtq

People also ask

1 Answers

d11wtq

Recent Activity

Donate For Us