What is the precise definition of a lookahead set?

Tags:

I'm toying around with writing compilers and learning about the theory behind syntax analysis. I've found that even though it's a key concept for understanding recognition algorithms, information about it on the net is fairly poor. It seems StackOverflow is in a unique position to fix this problem.

511

asked Sep 15 '10 19:09

Lyudmil

1 Answers

The lookahead sets for a grammar is defined in terms of the lookahead sets for each of its non-terminals, which in turn rely on the lookahead set for each production. Determining lookahead sets can help us determine if a grammar is LL(1), and if it is, what information we need to construct a recursive-descent parser for it.

Definition: LOOKAHEAD(X -> α) and LOOKAHEAD(X):

LOOKAHEAD(X -> α) = FIRST(α) U FOLLOW(X), if NULLABLE(α)
LOOKAHEAD(X -> α) = FIRST(α), if not NULLABLE(α)
LOOKAHEAD(X) = LOOKAHEAD(X -> α) U LOOKAHEAD(X -> β) U LOOKAHEAD(X -> γ)

where FIRST(α) is the set of terminals that α can begin with, FOLLOW(X) is the set of terminals that can come after an X anywhere in the grammar, and NULLABLE(α) is whether α can derive an empty sequence of terminals (denoted ε). The following definitions are taken from Torben Mogensen's free book, Basics of Compiler Design. See below for an example.

Definition: NULLABLE(X):

NULLABLE(ε) = true
NULLABLE(x) = false, if x is a terminal
NULLABLE(αβ) = NULLABLE(α) and NULLABLE(β)
NULLABLE(P) = NULLABLE(α_1) or NULLABLE(α_2) or ... or NULLABLE(α_n),
               if P is a non-terminal and the right-hand-sides
               of all its productions are α_1, α_2, ..., α_n.

Definition: FIRST(X):

FIRST(ε) = Ø
FIRST(x) = {x}, assuming x is a terminal
FIRST(αβ) = FIRST(α) U FIRST(β), if NULLABLE(α)
          = FIRST(α), if not NULLABLE(α)
FIRST(P) = FIRST(α_1) U FIRST(α_2) U ... U FIRST(α_n),
               if P is a non-terminal and the right-hand-sides
               of all its productions are α_1, α_2, ..., α_n.

Definition: FOLLOW(X):

A terminal symbol a is in FOLLOW(X) if and only if there is a derivation from the start symbol S of the grammar such that S ⇒ αX aβ, where α and β are (possibly empty) sequences of grammar symbols.

Intuition: FOLLOW(X):

Look at where X occurs in the grammar. All terminals that could follow it (directly or by any level of recursion) is in FOLLOW(X). Additionally, if X occurs at the end of a production (e.g. A -> foo X), or is followed by something else that can reduce to ε (e.g. A -> foo X B and B -> ε), then whatever A can be followed by, X can also be followed by (i.e. FOLLOW(A) ⊆ FOLLOW(X)).

See the method for determining FOLLOW(X) in Torben's book and a demonstration of it just below.

An example:

E -> n A
A -> E B
A -> ε
B -> + A
B -> * A

First, NULLABLE and FIRST and are determined:

NULLABLE(E) = NULLABLE(n A) = NULLABLE(n) ∧ NULLABLE(A) = false
NULLABLE(A) = NULLABLE(E B) ∨ NULLABLE(ε) = true
NULLABLE(B) = NULLABLE(+ A) ∨ NULLABLE(* A) = false

FIRST(E) = FIRST(n A) = {n}
FIRST(A) = FIRST(E B) U FIRST(ε) = FIRST(E) U Ø = {n} (because E is not NULLABLE)
FIRST(B) = FIRST(+ A) U FIRST(* A) = FIRST(+) U FIRST(*) = {+, *}

Before FOLLOW is determined, the production E' -> E $ is added, where $ is considered the "end-of-file" non-terminal. Then FOLLOW is determined:

FOLLOW(E): Let β = $, so add the constraint that FIRST($) = {$} ⊆ FOLLOW(E)
           Let β = B, so add the constraint that FIRST(B) = {+, *} ⊆ FOLLOW(E)
FOLLOW(A): Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(E) ⊆ FOLLOW(A).
           Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(B) ⊆ FOLLOW(A).
           Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(B) ⊆ FOLLOW(A).
FOLLOW(B): Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(B).
           Because NULLABLE(ε), add the constraint that FOLLOW(A) ⊆ FOLLOW(B).

Resolving these constraints (could also be achieved by fixed-point iteration),

    {+, *, $} ⊆ FOLLOW(E)
    FOLLOW(E) ⊆ FOLLOW(A)
    FOLLOW(A) = FOLLOW(B)

    FOLLOW(E) = FOLLOW(A) = FOLLOW(B) = {+, *, $}.

Now LOOKAHEAD for each production can be determined:

LOOKAHEAD(E -> n A) = FIRST(n A) = {n}     because ¬NULLABLE(n A)
LOOKAHEAD(A -> E B) = FIRST(E B)           because ¬NULLABLE(E B)
                    = FIRST(E) = {n}       because ¬NULLABLE(E)
LOOKAHEAD(A -> ε)   = FIRST(ε) U FOLLOW(A) because NULLABLE(ε)
                    = Ø U {+, *, $} = {+, *, $}
LOOKAHEAD(B -> + A) = FIRST(+ A)           because ¬NULLABLE(+ A)
                    = FIRST(+) = {+}       because ¬NULLABLE(+)
LOOKAHEAD(B -> * A) = {*}                  for the same reason

Finally, LOOKAHEAD for each non-terminal can be determined:

LOOKAHEAD(E) = LOOKAHEAD(E -> n A) = {n}
LOOKAHEAD(A) = LOOKAHEAD(A -> E B) U LOOKAHEAD(A -> ε)   = {n} U {+, *, $}
LOOKAHEAD(B) = LOOKAHEAD(B -> + A) U LOOKAHEAD(B -> * A) = {+, *}

By this knowledge we can determine that this grammar is not LL(1) because its non-terminals have overlapping lookahead sets. (I.e., we cannot create a program that reads one symbol at a time and decides unambiguously which production to use.)

132

answered Oct 21 '22 01:10

sshine

Related questions
                            
                                How to make a symbol table
                            
                                Can Coco/R turn a parsed file into bytecode?
                            
                                The difference between Mono C# Compiler and MS C# Compiler Regarding Scope
                            
                                How can fixed-point combinators make recursive constructs terminate?
                            
                                x86 instruction encoding how to choose opcode
                            
                                Where can I find a full parenthesizer for Haskell?
                            
                                Is there a easier / clearer syntax like Markdown that can compile into LaTeX?
                            
                                SendEmail with Indy components
                            
                                How exactly does the Scala implementation of <:<, <%<, =:= work in the compiler?
                            
                                Visual Studio 2012: Compiler setup (possible to use Compiler from SDK eg. VC++2008-Compiler VC++ 2010-Compiler)
                            
                                Dynamic Compilation in Haskell GHC API Error
                            
                                including static libraries with -all_load flag
                            
                                memcpy underlying data from std::vector of objects
                            
                                Compiler code optimization: AST vs. IR
                            
                                In C++, how does strict bottom-up analysis imply that the return type is not used in overloading resolution?
                            
                                Compile error in VS.NET 2008 (VB.NET) that I can't get rid of!
                            
                                How would you re-use C opcode implementations when writing a JIT with LLVM?
                            
                                Java to C cross compilation [closed]
                            
                                Generating Assembly For an x86 Processor
                            
                                Creating programming languages and compiler designing. Are they related?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the precise definition of a lookahead set?

Tags:

compiler-construction

compiler-theory

Lyudmil

People also ask

1 Answers

sshine

Recent Activity

Donate For Us