Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boost::Spirit Expression Parser

I have another problem with my boost::spirit parser.

template<typename Iterator>
struct expression: qi::grammar<Iterator, ast::expression(), ascii::space_type> {
    expression() :
        expression::base_type(expr) {
        number %= lexeme[double_];
        varname %= lexeme[alpha >> *(alnum | '_')];

        binop = (expr >> '+' >> expr)[_val = construct<ast::binary_op<ast::add>>(_1,_2)]
              | (expr >> '-' >> expr)[_val = construct<ast::binary_op<ast::sub>>(_1,_2)]
              | (expr >> '*' >> expr)[_val = construct<ast::binary_op<ast::mul>>(_1,_2)]
              | (expr >> '/' >> expr)[_val = construct<ast::binary_op<ast::div>>(_1,_2)] ;

        expr %= number | varname | binop;
    }

    qi::rule<Iterator, ast::expression(), ascii::space_type> expr;
    qi::rule<Iterator, ast::expression(), ascii::space_type> binop;
    qi::rule<Iterator, std::string(), ascii::space_type> varname;
    qi::rule<Iterator, double(), ascii::space_type> number;
};

This was my parser. It parsed "3.1415" and "var" just fine, but when I tried to parse "1+2" it tells me parse failed. I've then tried to change the binop rule to

    binop = expr >>
           (('+' >> expr)[_val = construct<ast::binary_op<ast::add>>(_1, _2)]
          | ('-' >> expr)[_val = construct<ast::binary_op<ast::sub>>(_1, _2)]
          | ('*' >> expr)[_val = construct<ast::binary_op<ast::mul>>(_1, _2)]
          | ('/' >> expr)[_val = construct<ast::binary_op<ast::div>>(_1, _2)]);

But now it's of course not able to build the AST, because _1 and _2 are set differently. I have only seen something like _r1 mentioned, but as a boost-Newbie I am not quite able to understand how boost::phoenix and boost::spirit interact.

How to solve this?

like image 805
Lanbo Avatar asked Dec 11 '11 15:12

Lanbo


1 Answers

It isn't entirely clear to me what you are trying to achieve. Most importantly, are you not worried about operator associativity? I'll just show simple answers based on using right-recursion - this leads to left-associative operators being parsed.

The straight answer to your visible question would be to juggle a fusion::vector2<char, ast::expression> - which isn't really any fun, especially in Phoenix lambda semantic actions. (I'll show below, what that looks like).

Meanwhile I think you should read up on the Spirit docs

  • here in the old Spirit docs (eliminating left recursion); Though the syntax no longer applies, Spirit still generates LL recursive descent parsers, so the concept behind left-recursion still applies. The code below shows this applied to Spirit Qi
  • here: the Qi examples contain three calculator samples, which should give you a hint on why operator associativity matters, and how you would express a grammar that captures the associativity of binary operators. Obviously, it also shows how to support parenthesized expressions to override the default evaluation order.

Code:

I have three version of code that works, parsing input like:

std::string input("1/2+3-4*5");

into an ast::expression grouped like (using BOOST_SPIRIT_DEBUG):

<expr>
  ....
  <success></success>
  <attributes>[[1, [2, [3, [4, 5]]]]]</attributes>
</expr>

The links to the code are here:

  • step_#1_reduce_semantic_actions.cpp
  • step_#2_drop_rule.cpp
  • step_#0_vector2.cpp

Step 1: Reduce semantic actions

First thing, I'd get rid of the alternative parse expressions per operator; this leads to excessive backtracking1. Also, as you've found out, it makes the grammar hard to maintain. So, here is a simpler variation that uses a function for the semantic action:

1check that using BOOST_SPIRIT_DEBUG!

static ast::expression make_binop(char discriminant, 
     const ast::expression& left, const ast::expression& right)
{
    switch(discriminant)
    {
        case '+': return ast::binary_op<ast::add>(left, right);
        case '-': return ast::binary_op<ast::sub>(left, right);
        case '/': return ast::binary_op<ast::div>(left, right);
        case '*': return ast::binary_op<ast::mul>(left, right);
    }
    throw std::runtime_error("unreachable in make_binop");
}

// rules:
number %= lexeme[double_];
varname %= lexeme[alpha >> *(alnum | '_')];

simple = varname | number;
binop = (simple >> char_("-+*/") >> expr) 
    [ _val = phx::bind(make_binop, qi::_2, qi::_1, qi::_3) ]; 

expr = binop | simple;

Step 2: Remove redundant rules, use _val

As you can see, this has the potential to reduce complexity. It is only a small step now, to remove the binop intermediate (which has become quite redundant):

number %= lexeme[double_];
varname %= lexeme[alpha >> *(alnum | '_')];

simple = varname | number;
expr = simple [ _val = _1 ] 
    > *(char_("-+*/") > expr) 
            [ _val = phx::bind(make_binop, qi::_1, _val, qi::_2) ]
    > eoi;

As you can see,

  • within the expr rule, the _val lazy placeholder is used as a pseudo-local variable that accumulates the binops. Across rules, you'd have to use qi::locals<ast::expression> for such an approach. (This was your question regarding _r1).
  • there are now explicit expectation points, making the grammar more robust
  • the expr rule no longer needs to be an auto-rule (expr = instead of expr %=)

Step 0: Wrestle fusion types directly

Finally, for fun and gory, let me show how you could have handled your suggested code, along with the shifting bindings of _1, _2 etc.:

static ast::expression make_binop(
        const ast::expression& left, 
        const boost::fusion::vector2<char, ast::expression>& op_right)
{
    switch(boost::fusion::get<0>(op_right))
    {
        case '+': return ast::binary_op<ast::add>(left, boost::fusion::get<1>(op_right));
        case '-': return ast::binary_op<ast::sub>(left, boost::fusion::get<1>(op_right));
        case '/': return ast::binary_op<ast::div>(left, boost::fusion::get<1>(op_right));
        case '*': return ast::binary_op<ast::mul>(left, boost::fusion::get<1>(op_right));
    }
    throw std::runtime_error("unreachable in make_op");
}

// rules:
expression::base_type(expr) {
number %= lexeme[double_];
varname %= lexeme[alpha >> *(alnum | '_')];

simple = varname | number;
binop %= (simple >> (char_("-+*/") > expr)) 
    [ _val = phx::bind(make_binop, qi::_1, qi::_2) ]; // note _2!!!

expr %= binop | simple;

As you can see, not nearly as much fun writing the make_binop function that way!

like image 148
sehe Avatar answered Nov 15 '22 18:11

sehe