Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing nested key value pairs in Boost Spirit

I am having trouble writing what I think should be a simple parser using Boost::Spirit. (I'm using Spirit instead of just using string functions as this is partly a learning exercise for me).

Data

The data to parse takes the form of key value pairs, where a value can itself be a key value pair. Keys are alphanumeric (with underscores and no digit as first character); values are alphanumeric plus .-_ - the values can be dates in the format DD-MMM-YYYY e.g. 01-Jan-2015 and floating point numbers like 3.1415 in addition to plain old alphanumeric strings. Keys and values are separated with =; pairs are separated with ;; structured values are delimited with {...}. At the moment I am erasing all spaces from the user input before passing it to Spirit.

Example input:

Key1 = Value1; Key2 = { NestedKey1=Alan; NestedKey2 = 43.1232; }; Key3 = 15-Jul-1974 ;

I would then strip all spaces to give

Key1=Value1;Key2={NestedKey1=Alan;NestedKey2=43.1232;};Key3=15-Jul-1974;

and then I actually pass it to Spirit.

Problem

What I currently have works just dandy when values are simply values. When I start encoding structured values in the input then Spirit stops after the first structured value. A workaround if there is only one structured value is to put it at the end of the input... but I will need two or more structured values on occasion.

The code

The below compiles in VS2013 and illustrates the errors:

#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/pair.hpp>
#include <boost/fusion/adapted.hpp>
#include <map>
#include <string>
#include <iostream>

typedef std::map<std::string, std::string> ARGTYPE;

#define BOOST_SPIRIT_DEBUG

namespace qi = boost::spirit::qi;
namespace fusion = boost::fusion;

template < typename It, typename Skipper>
struct NestedGrammar : qi::grammar < It, ARGTYPE(), Skipper >
{
    NestedGrammar() : NestedGrammar::base_type(Sequence)
    {
        using namespace qi;
        KeyName = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z0-9_");
        Value = +qi::char_("-.a-zA-Z_0-9");

        Pair = KeyName >> -(
            '=' >> ('{' >> raw[Sequence] >> '}' | Value)
            );

        Sequence = Pair >> *((qi::lit(';') | '&') >> Pair);

        BOOST_SPIRIT_DEBUG_NODE(KeyName);
        BOOST_SPIRIT_DEBUG_NODE(Value);
        BOOST_SPIRIT_DEBUG_NODE(Pair);
        BOOST_SPIRIT_DEBUG_NODE(Sequence);
    }
private:
    qi::rule<It, ARGTYPE(), Skipper> Sequence;
    qi::rule<It, std::string()> KeyName;
    qi::rule<It, std::string(), Skipper> Value;
    qi::rule<It, std::pair < std::string, std::string>(), Skipper> Pair;
};


template <typename Iterator>
ARGTYPE Parse2(Iterator begin, Iterator end)
{
    NestedGrammar<Iterator, qi::space_type> p;
    ARGTYPE data;
    qi::phrase_parse(begin, end, p, qi::space, data);
    return data;
}


// ARGTYPE is std::map<std::string,std::string>
void NestedParse(std::string Input, ARGTYPE& Output)
{
    Input.erase(std::remove_if(Input.begin(), Input.end(), isspace), Input.end());
    Output = Parse2(Input.begin(), Input.end());
}

int main(int argc, char** argv)
{
    std::string Example1, Example2, Example3;
    ARGTYPE Out;

    Example1 = "Key1=Value1 ; Key2 = 01-Jan-2015; Key3 = 2.7181; Key4 = Johnny";
    Example2 = "Key1 = Value1; Key2 = {InnerK1 = one; IK2 = 11-Nov-2011;};";
    Example3 = "K1 = V1; K2 = {IK1=IV1; IK2=IV2;}; K3=V3; K4 = {JK1=JV1; JK2=JV2;};";

    NestedParse(Example1, Out);
    for (ARGTYPE::iterator i = Out.begin(); i != Out.end(); i++)
        std::cout << i->first << "|" << i->second << std::endl;
    std::cout << "=====" << std::endl;

    /* get the following, as expected:
    Key1|Value1
    Key2|01-Jan-2015
    Key3|2.7181
    Key4|Johnny
    */

    NestedParse(Example2, Out);
    for (ARGTYPE::iterator i = Out.begin(); i != Out.end(); i++)
        std::cout << i->first << "|" << i->second << std::endl;
    std::cout << "=====" << std::endl;

    /* get the following, as expected:
    Key1|Value1
    key2|InnerK1=one;IK2=11-Nov-2011
    */

    NestedParse(Example3, Out);
    for (ARGTYPE::iterator i = Out.begin(); i != Out.end(); i++)
        std::cout << i->first << "|" << i->second << std::endl;

    /* Only get the first two lines of the expected output:
    K1|V1
    K2|IK1=IV1;IK2=IV2
    K3|V3
    K4|JK1=JV1;JK2=JV2
    */

    return 0;

}

I'm not sure if the problem is down to my ignorance of BNF, my ignorance of Spirit, or perhaps my ignorance of both at this point.

Any help appreciated. I've read e.g. Spirit Qi sequence parsing issues and links therein but I still can't figure out what I am doing wrong.

like image 559
manx_shearwater Avatar asked Oct 31 '22 03:10

manx_shearwater


1 Answers

Indeed this precisely a simple grammar that Spirit excels at.

Moreover there is absolutely no need to skip whitespace up front: Spirit has skippers built in for the purpose.

To your explicit question, though:

The Sequence rule is overcomplicated. You could just use the list operator (%):

Sequence = Pair % char_(";&");

Now your problem is that you end the sequence with a ; that isn't expected, so both Sequence and Value fail the parse eventually. This isn't very clear unless you #define BOOST_SPIRIT_DEBUG¹ and inspect the debug output.

So to fix it use:

Sequence = Pair % char_(";&") >> -omit[char_(";&")];

Fix Live On Coliru (or with debug info)

Prints:

Key1|Value1
Key2|01-Jan-2015
Key3|2.7181
Key4|Johnny
=====
Key1|Value1
Key2|InnerK1=one;IK2=11-Nov-2011;
=====
K1|V1
K2|IK1=IV1;IK2=IV2;
K3|V3
K4|JK1=JV1;JK2=JV2;

Bonus Cleanup

Actually, that was simple. Just remove the redundant line removing whitespace. The skipper was already qi::space.

(Note though that the skipper doesn't apply to your Value rule, so values cannot contain whitespace but the parser will not silently skip it either; I suppose this is likely what you want. Just be aware of it).

Recursive AST

You would actually want to have a recursive AST, instead of parsing into a flat map.

Boost recursive variants make this a breeze:

namespace ast {
    typedef boost::make_recursive_variant<std::string, std::map<std::string, boost::recursive_variant_> >::type Value;
    typedef std::map<std::string, Value> Sequence;
}

To make this work you just change the declared attribute types of the rules:

qi::rule<It, ast::Sequence(),                      Skipper> Sequence;
qi::rule<It, std::pair<std::string, ast::Value>(), Skipper> Pair;
qi::rule<It, std::string(),                        Skipper> String;
qi::rule<It, std::string()>                                 KeyName;

The rules themselves don't even have to change at all. You will need to write a little visitor to stream the AST:

static inline std::ostream& operator<<(std::ostream& os, ast::Value const& value) {
    struct vis : boost::static_visitor<> {
        vis(std::ostream& os, std::string indent = "") : _os(os), _indent(indent) {}

        void operator()(std::map<std::string, ast::Value> const& map) const {
            _os << "map {\n";
            for (auto& entry : map) {
                _os << _indent << "    " << entry.first << '|';
                boost::apply_visitor(vis(_os, _indent+"    "), entry.second);
                _os << "\n";
            }
            _os << _indent << "}\n";
        }
        void operator()(std::string const& s) const {
            _os << s;
        }

    private:
        std::ostream& _os;
        std::string _indent;
    };
    boost::apply_visitor(vis(os), value);
    return os;
}

Now it prints:

map {
    Key1|Value1
    Key2|01-Jan-2015
    Key3|2.7181
    Key4|Johnny
}

=====
map {
    Key1|Value1
    Key2|InnerK1 = one; IK2 = 11-Nov-2011;
}

=====
map {
    K1|V1
    K2|IK1=IV1; IK2=IV2;
    K3|V3
    K4|JK1=JV1; JK2=JV2;
}

Of course, the clincher is when you change raw[Sequence] to just Sequence now:

map {
    Key1|Value1
    Key2|01-Jan-2015
    Key3|2.7181
    Key4|Johnny
}

=====
map {
    Key1|Value1
    Key2|map {
        IK2|11-Nov-2011
        InnerK1|one
    }

}

=====
map {
    K1|V1
    K2|map {
        IK1|IV1
        IK2|IV2
    }

    K3|V3
    K4|map {
        JK1|JV1
        JK2|JV2
    }

}

Full Demo Code

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/variant.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <iostream>
#include <string>
#include <map>

namespace ast {
    typedef boost::make_recursive_variant<std::string, std::map<std::string, boost::recursive_variant_> >::type Value;
    typedef std::map<std::string, Value> Sequence;
}

namespace qi = boost::spirit::qi;

template <typename It, typename Skipper>
struct NestedGrammar : qi::grammar <It, ast::Sequence(), Skipper>
{
    NestedGrammar() : NestedGrammar::base_type(Sequence)
    {
        using namespace qi;
        KeyName = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z0-9_");
        String = +qi::char_("-.a-zA-Z_0-9");

        Pair = KeyName >> -(
                '=' >> ('{' >> Sequence >> '}' | String)
            );

        Sequence = Pair % char_(";&") >> -omit[char_(";&")];

        BOOST_SPIRIT_DEBUG_NODES((KeyName) (String) (Pair) (Sequence))
    }
private:
    qi::rule<It, ast::Sequence(),                      Skipper> Sequence;
    qi::rule<It, std::pair<std::string, ast::Value>(), Skipper> Pair;
    qi::rule<It, std::string(),                        Skipper> String;
    qi::rule<It, std::string()>                                 KeyName;
};


template <typename Iterator>
ast::Sequence DoParse(Iterator begin, Iterator end)
{
    NestedGrammar<Iterator, qi::space_type> p;
    ast::Sequence data;
    qi::phrase_parse(begin, end, p, qi::space, data);
    return data;
}

static inline std::ostream& operator<<(std::ostream& os, ast::Value const& value) {
    struct vis : boost::static_visitor<> {
        vis(std::ostream& os, std::string indent = "") : _os(os), _indent(indent) {}

        void operator()(std::map<std::string, ast::Value> const& map) const {
            _os << "map {\n";
            for (auto& entry : map) {
                _os << _indent << "    " << entry.first << '|';
                boost::apply_visitor(vis(_os, _indent+"    "), entry.second);
                _os << "\n";
            }
            _os << _indent << "}\n";
        }
        void operator()(std::string const& s) const {
            _os << s;
        }

      private:
        std::ostream& _os;
        std::string _indent;
    };
    boost::apply_visitor(vis(os), value);
    return os;
}

int main()
{
    std::string const Example1 = "Key1=Value1 ; Key2 = 01-Jan-2015; Key3 = 2.7181; Key4 = Johnny";
    std::string const Example2 = "Key1 = Value1; Key2 = {InnerK1 = one; IK2 = 11-Nov-2011;};";
    std::string const Example3 = "K1 = V1; K2 = {IK1=IV1; IK2=IV2;}; K3=V3; K4 = {JK1=JV1; JK2=JV2;};";

    std::cout << DoParse(Example1.begin(), Example1.end()) << "\n";
    std::cout << DoParse(Example2.begin(), Example2.end()) << "\n";
    std::cout << DoParse(Example3.begin(), Example3.end()) << "\n";
}

¹ You "had" it, but not in the right place! It should go before any Boost includes.

like image 81
sehe Avatar answered Nov 04 '22 06:11

sehe