I am having difficulties understanding the principle of lookahead in LR(1) - items. How do I compute the lookahead sets?
Say for an example that I have the following grammar:
S -> AB
A -> aAb | b
B -> d
Then the first state will look like this:
S -> .AB , {look ahead}
A -> .aAb, {look ahead}
A -> .b, {look ahead}
I know what look aheads are, but I don't know how to compute them. I have googled for answers but couldn't find a webpage which explains this in a simple manner.
Thanks in advance
LR (1) item is a collection of LR (0) items and a look ahead symbol. The look ahead is used to determine that where we place the final item. The look ahead always add $ symbol for the argument production.
Definition 1. An LR(1) item of a CFG G is a string of the form A → [α • β,a], where A → αβ is a production in G, and a is a terminal of G or the special symbol ε. Just like in the LR(0) algorithm, we first want to construct an NFA NG that tells us how items will be updated on shift transitions.
Difference between LR(0) and LR(1) algorithmLR(0)requires just the canonical items for the construction of a parsing table, but LR(1) requires the loookahead symbol as well. LR(1) allows us to choose the correct reductions and allows the use of grammar that are not uniquely invertible while LR(0) does not.
We may inspect the deterministic version of this automaton to determine whether the grammar is LR(1). It is LR(1) only if there are no reduce/reduce or shift/reduce conflicts. Here a shift/reduce conflict happens when a completed item in a state has its "follow token" also occurring as a shift from that state.
I'll write down the first two states for your example:
S -> AB
A -> aAb | b
B -> d
(1) S -> .AB, {$} # Once we have done this rule it's EOF ($)
(2) A -> .aAb, {d} # from (1), after A there has to be a B whose first symbol has to be d
(3) A -> .b, {d} # as above
(4) A -> a.Ab, {d} # from (2)
(5) A -> .aAb, {b} # from (4), the symbol after the A has to be b
(6) A -> .b, {b} # from (4), as above
(7) A -> b., {d} # from (3)
(8) S -> A.B, {$} # from (1) and (7)
(9) B -> .B, {$} # from (8)
and so on, keep following the same shift/reduce/closure as you would for an LR(0) parser, but keep track of (lookahead for) the next symbol...
(State 2+ are longer, I don't recommend working them out by hand!)
I suggest looking into Udacity's Programming Languages course to learn more about lexing and parsing. There is also an example on wikipedia and a related SO question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With