Suppose I have a simple grammar like:
X -> number
T -> X
T -> T + X
So for example 3 + 4 + 5 would parse as:
+
/ \
+ 5
/ \
3 4
This has the left-right associativity of + "built into" the grammar.
It is trivially LR(1), however suppose I want to do a hand-written top-down parse of it.
I cannot do so because it is left recursive, so lets left factor it:
X -> number
T -> X T'
T' -> + X T'
T' -> e // empty
If I now write a parser for it (psuedo-code):
parse_X:
if lookahead is number
return pop_lookahead
parse_T:
return (parse_X, parse_T')
parse_T':
if lookahead is +
pop_lookahead
return (parse_X, parse_T')
else
return ();
Then when I call parse_T on an input of 3 + 4 + 5 I get returned a trace like:
parse_T
(parse_X, parse_T')
(3, parse_T')
(3, (parse_X, parse_T'))
(3, (4, parse_T'))
(3, (4, (parse_X, parse_T')))
(3, (4, (5, ())))
See how the parse is "backwards". A tree constructed "naively" from such a parse looks like:
+
/ \
3 +
/ \
4 5
Which has the wrong associativity.
Can anyone clear this up? In general how can I write a hand-written top-down parser that preserves the associativity built into the grammar?
One strategy is to replace recursion over T with iteration over X (in general terms, replace recursion over an operator with iteration over the next-highest-precedence operator). It helps to use EBNF-style notation, in this case
T -> X {+ X}
because then the needed iteration becomes obvious:
parse_T:
val = parse_X
while lookahead is +
pop_lookahead
val = PLUS(val, parse_X)
return val
where PLUS() represents whatever you do to evaluate an addition expression, e.g. construct a tree node.
If you apply this to all operators, you wind up with essentially one function corresponding to EXPRESSION which only recurses when dealing with
PRIMARY -> '(' EXPRESSION ')'
Such an approach leads to a fairly fast expression parser; one of the common objections to using naive recursive descent to parse expressions is that if you have several levels of operator precedence, you can easily need 20 or so nested function calls to parse each PRIMARY, which can get fairly slow. With the iterative approach, it generally takes only one function call per PRIMARY. If you have some right-associative operators (e.g. exponentiation), you can use a recursive approach for them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With