Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using boost spirit for a stack based language

I need to parse a rather simple stack based language, e.g.

1 2 add
3 1 sub

and I'm facing two choices here:

  1. Write my own lexer for the tokens and then proceed parsing it
  2. Use boost spirit

I have never used boost spirit but from what I've read around (documentation and examples) I can't still make up my mind on whether it would be overkill to use boost spirit to lex and parse this simple language or if it would make sense to use it instead of rolling out my own lexer and parser (thing that I suppose shouldn't be too hard).

Would using boost spirit for a simple stack based language like the one above pay off (since I need to learn it first before I can use it)?

like image 776
Dean Avatar asked Nov 21 '15 17:11

Dean


1 Answers

In the category "exhaustive explorations", let me add some "on the fly interpreting" stack machines using Spirit Qi (v2.x) and X3

Note that an AST-ful approach (2 stage parse/execute) is shown in the second answer

In Spirit Qi

Here the semantic actions have to be "composed" using Phoenix actors:

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/repository/include/qi_distinct.hpp>
#include <iostream>
#include <deque>

namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
namespace qr = boost::spirit::repository::qi;

using Stack = std::deque<int>;

namespace actors {

    struct pop {
        Stack& s_;

        Stack::value_type operator()() const {
            Stack::value_type v = s_.back();
            s_.pop_back();
            return v;
        }
    };

    struct push {
        Stack& s_;
        template <typename V> void operator()(V const& v) const {
            s_.push_back(v);
        }
    };

    struct dump {
        Stack& s_;
        void operator()() const {
            std::copy(s_.begin(), s_.end(), std::ostream_iterator<Stack::value_type>(std::cout, " "));
            std::cout << "\n";
        }
    };
}

int main() {
    Stack stack_;

    boost::spirit::istream_iterator f(std::cin >> std::noskipws), l; // Note the noskipws!
    bool ok;

    {
        using namespace qi;
        px::function<actors::pop>  pop_  = actors::pop{ stack_ };
        px::function<actors::push> push_ = actors::push{ stack_ };
        px::function<actors::dump> dump_ = actors::dump{ stack_ };

        ok = phrase_parse(f, l, 
               *(
                   eps [ dump_() ] >> 
                   (lexeme [ qr::distinct(graph) [
                          lit("add") [ push_(  pop_() + pop_()) ]
                        | lit("sub") [ push_(- pop_() + pop_()) ] // bit hackish
                        | lit("mul") [ push_(pop_() * pop_()) ]
                        | lit("div") [ push_(pop_() / pop_()) ] // TODO fix order
                        | lit("pop") [ pop_() ]
                      ] ] 
                    | int_ [ push_(_1) ]
                  )
                ), space);
    }

    if (!ok)
        std::cout << "Parse failed\n";

    if (f != l)
        std::cout << "Unparsed program data: '" << std::string(f,l) << "'\n";
}

Printing

1 
1 2 
3 
3 3 
3 3 1 
3 2 
6 

Notes:

  • it gets ugly (the learning curve is hefty; See also Boost Spirit: "Semantic actions are evil"?)
  • the comments speak volumes; actually fixing that order or operands in sub and div is not easy. It's gonna require some advanced Phoenix-fu (http://www.boost.org/doc/libs/1_59_0/libs/phoenix/doc/html/phoenix/modules/scope/let.html)

In Spirit X3

The idea is the same, but we can use proper functional composition using lambdas.

We even use a helper to dynamically generate the parser expression along with suitable binop:

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <iostream>
#include <deque>
#include <cassert>

int main() {
    std::deque<int> stack_;

    boost::spirit::istream_iterator f(std::cin >> std::noskipws), l; // Note the noskipws!
    bool ok;

    {
        using namespace boost::spirit::x3;
        struct stack_tag {};

        auto binop = [](auto id, auto f) {
            auto apply = [=](auto& ctx) {
                auto& s = get<stack_tag>(ctx);
                assert(s.size()>=2);

                auto rhs = s.back(); s.pop_back();
                auto lhs = s.back(); s.pop_back();
                s.push_back(f(lhs, rhs));
            };

            return lexeme[as_parser(id) >> !graph] [apply];
        };

        auto push = [](auto& ctx) {
            auto& s = get<stack_tag>(ctx);
            s.push_back(_attr(ctx));
        };

        auto dump = [](auto& ctx) {
            auto& s = get<stack_tag>(ctx);
            std::copy(s.begin(), s.end(), std::ostream_iterator<int>(std::cout, " "));
            std::cout << "\n";
        };

        auto instr   = binop("add", [](auto a, auto b) { return a + b; })
                     | binop("sub", [](auto a, auto b) { return a - b; })
                     | binop("mul", [](auto a, auto b) { return a * b; })
                     | binop("div", [](auto a, auto b) { return a / b; })
                     | int_ [ push ]
                     ;

        auto parser  = skip(space) [ *(eps [ dump ] >> instr) >> eps/*post-skip*/ ];
        auto machine = with<stack_tag>(stack_) [parser];

        ok = parse(f, l, machine);
    }

    if (!ok)
        std::cout << "Parse failed\n";

    if (f != l)
        std::cout << "Unparsed program data: '" << std::string(f,l) << "'\n";
}

Of course it prints the same output.

  • it has none of the disadvantages the Qi version has
  • it compiles much faster (2.9s versus 9.2s!)
  • Note: X3 requires C++14
like image 90
sehe Avatar answered Oct 13 '22 22:10

sehe