As I understand, in the following case left factoring is required to build top-down parser. But it's hard to understand how to do that? Can someone help me here? Thanks.
s = a | b
b = c d
c = (e | f) g
e = a | h
Left Factoring is a grammar transformation technique. It consists in "factoring out" prefixes which are common to two or more productions. Left Recursion is a property a grammar has whenever you can derive from a given variable (non terminal) a rhs that begins with the same variable, in one or more steps.
In terms of context-free grammar, a nonterminal is left-recursive if the leftmost symbol in one of its productions is itself (in the case of direct left recursion) or can be made itself by some sequence of substitutions (in the case of indirect left recursion).
Left factoring transforms the grammar to make it useful for top-down parsers. In this technique, we make one production for each common prefixes and the rest of the derivation is added by new productions. Now the parser has only one production per prefix which makes it easier to take decisions.
It is a process of factoring out the common prefixes of alternatives. It is used when it is not clear that which of the two alternatives is used to expand the non-terminal.
Every non-terminal is only referenced once here, so we can pull the entire grammar together in a single expression:
s = a | ((a | h | f) g d)
So we have two basic variations, the terminal a optionally followed by g then d, or else one of h or f always followed by g then d.
So we have
s = b' | c'
b' = a | a g d
c' = (h | f) g d
or, pulling the common g d sequence into its own production
s = b' | c'
b' = a | a e'
c' = (h | f) e'
e' = g d
We can then pull a up as a common starting symbol in b' by introducing an E (empty) option:
s = b'' | c'
b'' = a (e' | E)
c' = (h | f) e'
e' = g d
The grammar is now unambiguous.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With