Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate FIRST sets by hand

I don't understand one of the examples provided by my tutor.

Example

S ::= aBA | BB | Bc
A ::= Ad | d
B ::= ε

We have

FIRST(B) = FIRST(ε)
         = {ε}

FIRST(A) = FIRST(Ad) ∪ FIRST(d)
         = FIRST(A) ∪ {d}
         = {d}

FIRST(S) = FIRST(aBA) ∪ FIRST(BB) ∪ FIRST(Bc)
         = FIRST(a) ∪ (FIRST(B)\{ε}) ∪ FIRST(B) ∪ (FIRST(B)\{ε) ∪ FIRST(c)
         = {a, ε, c}

Why is there a FIRST(B) in the FIRST(S) calculation? Shouldn't it be

(FIRST(B)\{ε)?

Why is A missing from FIRST(S) calculation?

like image 973
Danny Rancher Avatar asked Oct 28 '13 11:10

Danny Rancher


1 Answers

This page gives the mechanical rules for deriving FIRST (and FOLLOW) sets. I'll try to explain the logic behind these rules and how they apply to your example.

FIRST sets

FIRST(u) is the set of terminals that can occur first in a full derivation of u, where u is a sequence of terminals and non-terminals. In other words, when calculating the FIRST(u) set, we are looking only for the terminals that could possibly be the first terminal of a string that can be derived from u.

FIRST(aBA)

Given the definition, we can see that FIRST(aBA) reduces to FIRST(a), then to a. This is because no matter what the A and B productions are, the terminal a will always occur first in anything derived from aBA since a is a terminal, and can't be removed from the front of that sequence.

FIRST(Bc)

I'm going to skip FIRST(BB) for now and move on to FIRST(Bc). Things are different here, since B is a non-terminal. At first, we say that anything in FIRST(B) is also in FIRST(S). Unfortunately, FIRST(B) contains ε which causes problems, as we could have the scenario

   FIRST(Bc)
-> FIRST(εc)
=  FIRST(c)
=  c

where the arrow is a possible derivation/reduction. In general, we therefore say that FIRST(Xu), where ε is in FIRST(X), is equal to (FIRST(X)\{ε}) ∪ FIRST(u). This explains the last two terms in your calculation.

FIRST(BB)

Using the above rule, we can now derive FIRST(BB) as (FIRST(B)\{ε}) ∪ FIRST(B). Similarly, if we were calculating FIRST(BBB) we would reduce it as

  FIRST(BBB)
= (FIRST(B)\{ε}) ∪ FIRST(BB)
= (FIRST(B)\{ε}) ∪ (FIRST(B)\{ε}) ∪ FIRST(B)

Of note is that while calculating a FIRST set, the last symbol in a sequence of symbols never has the empty string removed from it, because at this point, the empty string is a legitimate possibility. This can be seen in a possible derivation in your example:

   S
-> BB
-> εε
-> ε

Hopefully you can see from all of the above why FIRST(B) appears in your calculation while FIRST(A) does not.

like image 68
DPenner1 Avatar answered Nov 11 '22 11:11

DPenner1