Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Specifying C++11 Grammar Action Functions in Shift/Reduce Parser Generator?

I'm working on a shift/reduce parser generator in C++11 and I am not sure how to specify the interface type of the input productions and reduction action functions such that they will hold the information I want to put in them.

I want to specify the grammar statically but using C++ types (not a separate build tool).

For each symbol (terminals and non-terminals) the user provides a string name and a type.

Then each production specifies a head symbol name and one or more body symbol names.

For each production an action function is provided by the user (the hard part) that returns the head nonterminal type and has parameters corresponding to the production body symbols (of their corresponding types).

The main problem is statically binding the parameter types and return type of these action functions to the corresponding symbol types

So for example:

Suppose we have nonterminals X, A B C

Their names/types might be:

"X" Foo
"A" string
"B" string
"C" int

And in the grammar there might be a production:

X -> A B C

And there will be an action function provided by the user for that production:

Foo f(string A, string B, int C)

If that production is reduced than the function f should be called with the production body parameters. The value returned by f is then stored for when that symbol is used in a higher up reduction.

So to specify the grammar to the parser generator I need to provide something like:

(I know the following is invalid)

struct Symbol
{
    string name;
    type T;
}

struct Production
{
    string head;
    vector<string> body;

    function<head.T(body[0].T, body[1].T, ..., body[n].T)> action;
}

struct Grammar
{
    vector<Symbol> symbols;

    vector<Production> productions;
}

And to specify the earlier example would be:

Grammar example =
{
    // symbols
    {
        { "X", Foo },
        { "A", string },
        { "B", string },
        { "C", int }
    },

    // productions
    {
        { 
            "X",
            { "A", "B", "C" },
            [](string A, string B, int C) { ... return Foo(...); }
        }
    }
}

This won't work of course, you can't mix type parameters with runtime parameters like that.

One solution would be to have some generic base:

struct SymbolBase
{
    ...
}

template<class SymbolType>
struct SymbolDerived<SymbolType> : SymbolBase
{
    SymbolType value;
}

and then make all action functions of type:

typedef function<SymbolBase(vector<SymbolBase>)> ActionFunction;

and sort it out at runtime. But this makes usage more difficult, and all the casting is slow. I'd rather have the function signatures checked at compile-time and keep the mechanics hidden from the user.

How can I restructure the Symbol, Production and Grammar types to carry the information I am trying to convey in legal C++11?

(Yes I have looked at Boost Spirit and friends, it is a fine framework but it is recursive descent so the languages it can handle in a single pass are fewer than a LALR parser and because it uses backtracking the reduction actions will get called multiple times, etc, etc)

like image 908
Andrew Tomazos Avatar asked Nov 12 '22 14:11

Andrew Tomazos


1 Answers

I've been playing around with precisely this problem. Once possibility I've been looking at, which looks like it should work, is to use a stack of variant objects, perhaps boost::variant or boost::any. Since each reduction knows what it's expecting from the stack, the access will be type-safe; unfortunately, the type-check will be at run-time, but it should be very cheap. This has the advantage of catching bugs :) and it will also correctly destruct objects as they're popped from the stack.

I threw together some sample code as a PoC, available upon request. The basic style for writing a reduction rule is something like this:

parse.reduce<Expression(Expression, _, Expression)>
  ( [](Expression left, Expression right){
      return BinaryOperation(Operator::Times, left, right);
  });

which corresponds to the rule:

expression: expression TIMES expression

Here, BinaryOperation is the AST node-type, and must be convertible to Expression; the template argument Expression(Expression, _, Expression) is exactly the left-hand-side and right-hand-side of the production, expressed as types. (Because the second RHS type is _, the templates don't bother feeding the value to the reduction rule: with a proper parser generator, there would actually be no reason to even push punctuation tokens onto the stack in the first place.) I implemented both the tagged union Expression and the tagged type of the parser stack using boost::variant. In case you try this, it's worth knowing that using a variant as one of the option types of another variant doesn't really work. In the end, it was easiest to wrap the smaller union as a struct. You also really have to read the section about recursive types.

like image 138
rici Avatar answered Nov 15 '22 06:11

rici