Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are FIRST and FOLLOW sets used for in parsing?

What are FIRST and FOLLOW sets? What are they used for in parsing? Are they used for top-down or bottom-up parsers?

Can anyone explain me FIRST and FOLLOW SETS for the following set of grammar rules:

E := E+T | T
T := T*V | T
V := <id>
like image 684
user460920 Avatar asked Oct 14 '22 23:10

user460920


1 Answers

They are typically used in LL (top-down) parsers to check if the running parser would encounter any situation where there is more than one way to continue parsing.

If you have the alternative A | B and also have FIRST(A) = {"a"} and FIRST(B) = {"b", "a"} then you would have a FIRST/FIRST conflict because when "a" comes next in the input you wouldn't know whether to expand A or B. (Assuming lookahead is 1).

On the other side if you have a Nonterminal that is nullable like AOpt: ("a")? then you have to make sure that FOLLOW(AOpt) doesn't contain "a" because otherwise you wouldn't know if to expand AOpt or not like here: S: AOpt "a" Either S or AOpt could consume "a" which gives us a FIRST/FOLLOW conflict.

FIRST sets can also be used during the parsing process for performance reasons. If you have a nullable nonterminal NullableNt you can expand it in order to see if it can consume anything, or it may be faster to check if FIRST(NullableNt) contains the next token and if not simply ignore it (backtracking vs predictive parsing). Another performance improvement would be to additionally provide the lexical scanner with the current FIRST set, so the scanner does not try all possible terminals but only those that are currently allowed by the context. This conflicts with reserved terminals but those are not always needed.

Bottom up parsers have different kinds of conflicts namely Reduce/Reduce and Shift/Reduce. They also use item sets to detect conflicts and not FIRST,FOLLOW.

Your grammar would't work with LL-parsers because it contains left recursion. But the FIRST sets for E, T and V would be {id} (assuming your T := T*V | T is meant to be T := T*V | V).

like image 137
user764754 Avatar answered Oct 21 '22 09:10

user764754