I don't understand one of the examples provided by my tutor.
Example
S ::= aBA | BB | Bc
A ::= Ad | d
B ::= ε
We have
FIRST(B) = FIRST(ε)
= {ε}
FIRST(A) = FIRST(Ad) ∪ FIRST(d)
= FIRST(A) ∪ {d}
= {d}
FIRST(S) = FIRST(aBA) ∪ FIRST(BB) ∪ FIRST(Bc)
= FIRST(a) ∪ (FIRST(B)\{ε}) ∪ FIRST(B) ∪ (FIRST(B)\{ε) ∪ FIRST(c)
= {a, ε, c}
Why is there a FIRST(B) in the FIRST(S) calculation? Shouldn't it be
(FIRST(B)\{ε)?
Why is A missing from FIRST(S) calculation?
This page gives the mechanical rules for deriving FIRST (and FOLLOW) sets. I'll try to explain the logic behind these rules and how they apply to your example.
FIRST(u)
is the set of terminals that can occur first in a full derivation of u
, where u
is a sequence of terminals and non-terminals. In other words, when calculating the FIRST(u)
set, we are looking only for the terminals that could possibly be the first terminal of a string that can be derived from u
.
Given the definition, we can see that FIRST(aBA)
reduces to FIRST(a)
, then to a
. This is because no matter what the A
and B
productions are, the terminal a
will always occur first in anything derived from aBA
since a
is a terminal, and can't be removed from the front of that sequence.
I'm going to skip FIRST(BB)
for now and move on to FIRST(Bc)
. Things are different here, since B
is a non-terminal. At first, we say that anything in FIRST(B)
is also in FIRST(S)
. Unfortunately, FIRST(B)
contains ε
which causes problems, as we could have the scenario
FIRST(Bc)
-> FIRST(εc)
= FIRST(c)
= c
where the arrow is a possible derivation/reduction. In general, we therefore say that FIRST(Xu)
, where ε
is in FIRST(X)
, is equal to (FIRST(X)\{ε}) ∪ FIRST(u)
. This explains the last two terms in your calculation.
Using the above rule, we can now derive FIRST(BB)
as (FIRST(B)\{ε}) ∪ FIRST(B)
. Similarly, if we were calculating FIRST(BBB)
we would reduce it as
FIRST(BBB)
= (FIRST(B)\{ε}) ∪ FIRST(BB)
= (FIRST(B)\{ε}) ∪ (FIRST(B)\{ε}) ∪ FIRST(B)
Of note is that while calculating a FIRST set, the last symbol in a sequence of symbols never has the empty string removed from it, because at this point, the empty string is a legitimate possibility. This can be seen in a possible derivation in your example:
S
-> BB
-> εε
-> ε
Hopefully you can see from all of the above why FIRST(B)
appears in your calculation while FIRST(A)
does not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With