Given a simple grammar, like
rule1
:= token1 token2 token3 token4
|| token1 token2 token3 token3;
What's the difference between shifting the first three tokens, then looking at the fourth to see which rule to reduce, and simply performing a lookahead of three tokens to see which rule to reduce?
In a shift/reduce parser, the lookahead is not used to determine which production is being considered, but rather to decide whether the parser should shift the next token or take some sort of reduce action. If you had a shift/reduce parser for the above grammar, the parser would always shift four tokens before deciding whether or not to reduce; remember that in LR parsers, reductions only are performed when the appropriate series of symbols is atop the parsing stack. Lookahead would only be necessary here if the parser couldn't tell whether it should reduce the four tokens it had or to keep shifting more symbols and reduce later on.
Specifically, the parser would probably do something like this:
Stack Input Action
-------------------------------------------------------------------------------
token1 token2 token3 token4 Shift
token1 token2 token3 token4 Shift
token1 token2 token3 token4 Shift
token1 token2 token3 token4 Shift
token1 token2 token3 token4 Reduce, Option 1
rule1 Accept
Or
Stack Input Action
-------------------------------------------------------------------------------
token1 token2 token3 token3 Shift
token1 token2 token3 token3 Shift
token1 token2 token3 token3 Shift
token1 token2 token3 token3 Shift
token1 token2 token3 token3 Reduce, Option 2
rule1 Accept
Note that this contrasts with top-down parsers like LL(k) parsers, which work by trying to predict the production to use. In that case, four lookahead tokens would be required, because the parser is guessing the production and then checking its guess (predict/match parsing). For example, in a top-down parser (which would have to be LL(4) here), it would do the following:
Stack Input Action
----------------------------------------------------------------------------------
rule1 token1 token2 token3 token4 $$$$ Predict, Option 1
token1 token2 token3 token4 token1 token2 token3 token4 $$$$ Match
token2 token3 token4 token2 token3 token4 $$$$ Match
token3 token4 token3 token4 $$$$ Match
token4 token4 $$$$ Match
$$$$ Accept
Or
Stack Input Action
----------------------------------------------------------------------------------
rule1 token1 token2 token3 token3 $$$$ Predict, Option 2
token1 token2 token3 token3 token1 token2 token3 token3 $$$$ Match
token2 token3 token3 token2 token3 token3 $$$$ Match
token3 token3 token3 token3 $$$$ Match
token3 token3 $$$$ Match
$$$$ Accept
Notice how the lookahead is needed to predict which production to use, so the parser must have four tokens of lookahead. In an LR parser, the parser works by inspecting more tokens until it's comfortable that it's seen what it's looking for, then reduces (shift/reduce parsing). In this case, no lookahead is required at all. Lookahead is only required in an LR parser to determine whether the parser has seen the end of the handle (the string to reduce), or whether it's in the middle of the handle and has to keep shifting. This is why, for example, some interesting grammars can be shown to be LR(0), but the only grammars that are LL(0) are grammars in which each nonterminal has exactly one production associated with it; the lookahead has fundamentally different uses in top-down versus bottom-up parsing.
Generally speaking, top-down parsers can handle fewer grammars than bottom-up parsers, and in fact any LL(k) grammar is guaranteed to be LR(k) but not the other way around.
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With