Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive left-recursion in hand-written recursive ascent parser

Tags:

c++

parsing

c++11

I've been writing some recursive ascent parsers, and one of the things I've been struggling with is left recursion. It seems obvious to me that right recursion can be expressed recursively, like

addExpr
    : primaryExpr '+' addExpr
    | primaryExpr;

by something along the lines of

parseAddExpr() {
    auto x = parsePrimaryExpr();
    if (next_token == '+') {
        auto result = make_unique<PlusExpr>();
        result->lhs = x;
        result->rhs = parseAddExpr();
        return std::move(result);
    }
    return std::move(x);
}

But for left recursion, all I could come up with is a while loop.

mulExpr
    : mulExpr '*' addExpr
    | addExpr;

parseMulExpr() {
    auto expr = parseAddExpr();
    while(next_token == '*') {
        auto mul_expr = make_unique<MulExpr>();
        mul_expr->lhs = std::move(expr);
        mul_expr->rhs = parseAddExpr();
        expr = std::move(mul_expr);
    }
    return std::move(expr);
}

I mean, this functions fine, but I always felt that it should have a recursive version. Is it possible to implement a left-associative operator in a recursive, instead of iterative, fashion?

like image 590
Puppy Avatar asked Sep 08 '12 18:09

Puppy


1 Answers

Your functions are recursive descent, not recursive ascent. The problem that recursive descent parsers have with left recursion, which you have encountered, is very well known and studied. Any compilers course or textbook that covers parsing will discuss the problem and ways to solve it.

Using iteration is a perfectly normal, valid way to handle it. See these course notes for example. In those notes, the rule T -> T '*' S | T '/' S | S (which is your mulExpr rule with division added) is transformed into the rule T -> S { ('*' | '/') S }, where the braces {...} mean “zero or more repetitions”.

UPDATE

Based on your comment, I think you have some confusion about what “recursive descent” means and what “recursive ascent” means.

Recursive descent

The basic idea of recursive descent is to create one function for each nonterminal in the grammar. The job of the function corresponding to some nonterminal A is to completely parse one instance of A if possible, calling as necessary the functions for the nonterminals appearing on the right-hand side of A's productions in the grammar.

So, for example, your grammar has a nonterminal addExpr with these two productions:

addExpr -> primaryExpr '+' addExpr
addExpr -> primaryExpr

Therefore a recursive descent parser will have a function for addExpr that tries to match the right-hand side of one of the addExpr productions, calling the functions for primaryExpr and addExpr (itself!) as necessary because those nonterminals appear in addExpr's productions.

And indeed this is exactly what you have in your parseAddExpr function. It looks for a way to match one of the addExpr productions, calling parsePrimaryExpr and parseAddExpr as necessary.

Recursive ascent

Recursive ascent is a (very uncommon) way to implement LR parsing. An LR parser has a state table, with a row for each state and a column for each terminal. In a recursive ascent parser, we don't represent the table as data. Instead, we create one function for each state, and that's state's row is embodied as a switch statement in the function.

In an LR parser, there is not normally a one-to-one correspondence between states and nonterminals, or between states and productions. Generally there will be more states than productions. Each state represents a set of positions within productions.

Your parser

Looking at the functions in your post, I see no evidence that you've constructed or embodied a state table. What I see is a set of functions that correspond directly to the nonterminals of your grammar. That correspondence is the hallmark of recursive descent.

Also, the fact that you're encountering trouble with left recursive productions is a dead giveaway that you're using recursive descent. LR parsers do not have problems with left recursion and recursive descent parsers do.

like image 116
rob mayoff Avatar answered Sep 20 '22 01:09

rob mayoff