I'm having trouble building the collection of sets of items for LR(1) parsers with a grammar containing epsilon productions. For example, given the following grammar (where eps stands for epsilon)
S -> a S U
U -> b
| eps
State0 would be
S' -> .S, $
S -> .a S U, $
Moving with 'a' from State0 would give the following state, let's call it State2
S -> a .S U, $
S -> .a S U, $/???
In order to have the lookahead for the second item of State2 I need to calculate FIRST(U$). I know that FIRST(U) = {'b', eps}. My first question is: the lookaheads of the second item of State2 are $ and 'b'? Since U can be eps, my brain tells me that I can have $ as a lookahead as well, not just 'b'. It would have been just 'b' if FIRST(U) would have been just {'b'}. Is that correct?
Second question: at some point I will have a state as the following one
S -> a S .U, $
U -> .b, $
U -> .eps, $
What do I do here? Do I need to move with eps and have a set with the item U -> eps., $
? What if I have another terminal as lookahead, i.e. X -> .eps, a/$
? And if I move, ending up having a set of the form X -> eps., $
, do I reduce?
And more: do I need to insert eps in the parse table as a symbol?
Thanks
FIRST(U$)
means "the set of symbols which could be first in a derivation of U$
". Clearly, if U
can derive the empty string, $
must be part of this set. The end-of-input marker $
ensures that we never have to worry about epsilons in the FIRST
sets. (If we were doing LR(k) instead of LR(1), we would use k
end markers so that all the strings in FIRSTk
had length k
.
The item associated with U →
(or with U → ε
if you insist) is U → •
. In other words, it is reducible and should trigger a reduce action on matching lookahead.
ε
is not a symbol; we only use it (sometimes) to make the empty string visible. But the empty string is empty.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With