Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a table-based LL parser handle repetition without right-recursion?

I understand how an LL recursive descent parser can handle rules of this form:

A = B*;

with a simple loop that checks whether to continue looping or not based on whether the lookahead token matches a terminal in the FIRST set of B. However, I'm curious about table based LL parsers: how can rules of this form work there? As far as I know, the only way to handle repetition like this in one is through right-recursion, but that messes up associativity in cases where a right-associative parse tree is not desired.

I'd like to know because I'm currently attempting to write an LL(1) table-based parser generator and I'm not sure how to handle a case like this without changing the intended parse tree shape.

like image 628
chbaker0 Avatar asked Oct 17 '14 23:10

chbaker0


People also ask

Is LL parser recursive-descent?

A form of recursive-descent parsing that does not require any back-tracking is known as predictive parsing. It is also called as LL(1) parsing table technique since we would be building a table for string to be parsed. It has capability to predict which production is to be used to replace input string.

What are the limitations of recursive descent parser?

One of major drawback or recursive-descent parsing is that it can be implemented only for those languages which support recursive procedure calls and it suffers from the problem of left-recursion.

What is LR parser explain left recursion in LR parsing process?

LR parsing is one type of bottom up parsing. It is used to parse the large class of grammars. In the LR parsing, "L" stands for left-to-right scanning of the input. "R" stands for constructing a right most derivation in reverse.

What are the problems in recursive parser?

The main limitation of recursive descent parsing (and top-down parsing algorithms in general) is that they only work on grammars with certain properties. For example, if a grammar contains any left recursion, recursive descent parsing doesn't work.


2 Answers

The Grammar

Let's expand your EBNF grammar to simple BNF and assume, that b is a terminal and <e> is an empty string:

A -> X
X -> BX
X -> <e>
B -> b

This grammar produces strings of terminal b's of any length.

The LL(1) Table

To construct the table, we will need to generate the first and follow sets (constructing an LL(1) parsing table).

First sets

First(α) is the set of terminals that begin strings derived from any string of grammar symbols α.

First(A) : b, <e>
First(X) : b, <e>
First(B) : b

Follow sets

Follow(A) is the set of terminals a that can appear immediately to the right of a nonterminal A.

Follow(A) : $
Follow(X) : $
Follow(B) : b$

Table

We can now construct the table based on the sets, $ is the end of input marker.

+---+---------+----------+
|   |    b    |    $     |
+---+---------+----------+
| A | A -> X  | A -> X   |
| X | X -> BX | X -> <e> |
| B | B -> b  |          |
+---+---------+----------+

The parser action always depends on the top of the parse stack and the next input symbol.

  1. Terminal on top of the parse stack:
    1. Matches the input symbol: pop stack, advance to the next input symbol
    2. No match: parse error
  2. Nonterminal on top of the parse stack:
    1. Parse table contains production: apply production to stack
    2. Cell is empty: parse error
  3. $ on top of the parse stack:
    1. $ is the input symbol: accept input
    2. $ is not the input symbol: parse error

Sample Parse

Let us analyze the input bb. The initial parse stack contains the start symbol and the end marker A $.

+-------+-------+-----------+
| Stack | Input |  Action   |
+-------+-------+-----------+
| A $   | bb$   | A -> X    |
| X $   | bb$   | X -> BX   |
| B X $ | bb$   | B -> b    |
| b X $ | bb$   | consume b |
| X $   | b$    | X -> BX   |
| B X $ | b$    | B -> b    |
| b X $ | b$    | consume b |
| X $   | $     | X -> <e>  |
| $     | $     | accept    |
+-------+-------+-----------+

Conclusion

As you can see, rules of the form A = B* can be parsed without problems. The resulting concrete parse tree for input bb would be:

parse tree

like image 158
Jen-Ya Avatar answered Sep 30 '22 02:09

Jen-Ya


Yes, this is definitely possible. The standard method of rewriting to BNF and constructing a parse table is useful for figuring out how the parser should work – but as far as I can tell, what you're asking is how you can avoid the recursive part, which would mean that you'd get the slanted binary tree/linked list form of AST.

If you're hand-coding the parser, you can simply use a loop, using the lookaheads from the parse table that indicate a recursive call to decide to go around the loop once more. (I.e., you could just use while with those lookaheads as the condition.) Then for each iteration, you simply append the constructed subtree as a child of the current parent. In your case, then, A would get several direct B-children.

Now, as I understand it, you're building a parser generator, and it might be easiest to follow the standard procedure, going via plan BNF. However, that's not really an issue; there is no substantive difference between iteration and recursion, after all. You simply have to have a class of “helper rules” that don't introduce new AST nodes, but that rather append their result to the node of the nonterminal that triggered them. So when turning the repetition into X -> BX, rather than constructing X nodes, you have your X rule extend the child-list of the A or X (whichever triggered it) by its own children. You'll still end up with A having several B children, and no X nodes in sight.

like image 27
Magnus Lie Hetland Avatar answered Sep 30 '22 03:09

Magnus Lie Hetland