Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between shifting and look-ahead

Given a simple grammar, like

rule1
    := token1 token2 token3 token4
    || token1 token2 token3 token3;

What's the difference between shifting the first three tokens, then looking at the fourth to see which rule to reduce, and simply performing a lookahead of three tokens to see which rule to reduce?

like image 433
Puppy Avatar asked Dec 12 '11 23:12

Puppy


1 Answers

In a shift/reduce parser, the lookahead is not used to determine which production is being considered, but rather to decide whether the parser should shift the next token or take some sort of reduce action. If you had a shift/reduce parser for the above grammar, the parser would always shift four tokens before deciding whether or not to reduce; remember that in LR parsers, reductions only are performed when the appropriate series of symbols is atop the parsing stack. Lookahead would only be necessary here if the parser couldn't tell whether it should reduce the four tokens it had or to keep shifting more symbols and reduce later on.

Specifically, the parser would probably do something like this:

Stack                           Input                        Action
-------------------------------------------------------------------------------
                                token1 token2 token3 token4  Shift
token1                                 token2 token3 token4  Shift
token1 token2                                 token3 token4  Shift
token1 token2 token3                                 token4  Shift
token1 token2 token3 token4                                  Reduce, Option 1
rule1                                                        Accept

Or

Stack                           Input                        Action
-------------------------------------------------------------------------------
                                token1 token2 token3 token3  Shift
token1                                 token2 token3 token3  Shift
token1 token2                                 token3 token3  Shift
token1 token2 token3                                 token3  Shift
token1 token2 token3 token3                                  Reduce, Option 2
rule1                                                        Accept

Note that this contrasts with top-down parsers like LL(k) parsers, which work by trying to predict the production to use. In that case, four lookahead tokens would be required, because the parser is guessing the production and then checking its guess (predict/match parsing). For example, in a top-down parser (which would have to be LL(4) here), it would do the following:

Stack                           Input                             Action
----------------------------------------------------------------------------------
rule1                           token1 token2 token3 token4 $$$$  Predict, Option 1
token1 token2 token3 token4     token1 token2 token3 token4 $$$$  Match
token2 token3 token4            token2 token3 token4 $$$$         Match
token3 token4                   token3 token4 $$$$                Match
token4                          token4 $$$$                       Match
                                $$$$                              Accept

Or

Stack                           Input                             Action
----------------------------------------------------------------------------------
rule1                           token1 token2 token3 token3 $$$$  Predict, Option 2
token1 token2 token3 token3     token1 token2 token3 token3 $$$$  Match
token2 token3 token3            token2 token3 token3 $$$$         Match
token3 token3                   token3 token3 $$$$                Match
token3                          token3 $$$$                       Match
                                $$$$                              Accept

Notice how the lookahead is needed to predict which production to use, so the parser must have four tokens of lookahead. In an LR parser, the parser works by inspecting more tokens until it's comfortable that it's seen what it's looking for, then reduces (shift/reduce parsing). In this case, no lookahead is required at all. Lookahead is only required in an LR parser to determine whether the parser has seen the end of the handle (the string to reduce), or whether it's in the middle of the handle and has to keep shifting. This is why, for example, some interesting grammars can be shown to be LR(0), but the only grammars that are LL(0) are grammars in which each nonterminal has exactly one production associated with it; the lookahead has fundamentally different uses in top-down versus bottom-up parsing.

Generally speaking, top-down parsers can handle fewer grammars than bottom-up parsers, and in fact any LL(k) grammar is guaranteed to be LR(k) but not the other way around.

Hope this helps!

like image 189
templatetypedef Avatar answered Nov 10 '22 14:11

templatetypedef