Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to parse and verify an ordered list of integers using qi

I'm parsing a text file, possibly several GB in size, consisting of lines as follows:

11 0.1
14 0.78
532 -3.5

Basically, one int and one float per line. The ints should be ordered and non-negative. I'd like to verify the data are as described, and have returned to me the min and max int in the range. This is what I've come up with:

#include <iostream>
#include <string>

#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/std_pair.hpp>

namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;

namespace my_parsers
{
using namespace qi;
using px::at_c;
using px::val;
template <typename Iterator>
struct verify_data : grammar<Iterator, locals<int>, std::pair<int, int>()>
{
    verify_data() : verify_data::base_type(section)
    {
        section
            =  line(val(0))    [ at_c<0>(_val) = _1]
            >> +line(_a)       [ _a = _1]
            >> eps             [ at_c<1>(_val) = _a]
            ;

        line
            %= (int_ >> other) [
                                   if_(_r1 >= _1)
                                   [
                                       std::cout << _r1 << " and "
                                       << _1 << val(" out of order\n")
                                   ]
                               ]
            ;

        other
            = omit[(lit(' ') | '\t') >> float_ >> eol];
    }
    rule<Iterator, locals<int>, std::pair<int, int>() > section;
    rule<Iterator, int(int)> line;
    rule<Iterator> other;
};
}

using namespace std;
int main(int argc, char** argv)
{
    string input("11 0.1\n"
                 "14 0.78\n"
                 "532 -3.6\n");

    my_parsers::verify_data<string::iterator> verifier;
    pair<int, int> p;
    std::string::iterator begin(input.begin()), end(input.end());
    cout << "parse result: " << boolalpha
         << qi::parse(begin, end, verifier, p) << endl; 
    cout << "p.first: " << p.first << "\np.second: " << p.second << endl;
    return 0;
}

What I'd like to know is the following:

  • Is there a better way of going about this? I have used inherited and synthesised attributes, local variables and a bit of phoenix voodoo. This is great; learning the tools is good but I can't help thinking there might be a much simpler way of achieving the same thing :/ (within a PEG parser that is...)
  • How could it be done without the local variable for instance?

More info: I have other data formats that are being parsed at the same time and so I'd like to keep the return value as a parser attribute. At the moment this is a std::pair, the other data formats when parsed, will expose their own std::pairs for instance and it's these that I'd like to stuff in a std::vector.

like image 777
dpj Avatar asked Oct 10 '22 17:10

dpj


1 Answers

This is at least a lot shorter already:

  • down to 28 LOC
  • no more locals
  • no more fusion vector at<> wizardry
  • no more inherited attributes
  • no more grammar class
  • no more manual iteration
  • using expectation points (see other) to enhance parse error reporting
  • this parser expressions synthesizes neatly into a vector<int> if you choose to assign it with %= (but it will cost performance, besides potentially allocating a largish array)

.

#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>

namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;

typedef std::string::iterator It;

int main(int argc, char** argv)
{
    std::string input("11 0.1\n"
            "14 0.78\n"
            "532 -3.6\n");

    int min=-1, max=0;
    {
        using namespace qi;
        using px::val;
        using px::ref;

        It begin(input.begin()), end(input.end());
        rule<It> index = int_ 
            [
                if_(ref(max) < _1)  [ ref(max) = _1 ] .else_ [ std::cout << _1 << val(" out of order\n") ],
                if_(ref(min) <  0)  [ ref(min) = _1 ]
            ] ;

        rule<It> other = char_(" \t") > float_ > eol;

        std::cout << "parse result: " << std::boolalpha 
                  << qi::parse(begin, end, index % other) << std::endl; 
    }
    std::cout << "min: " << min << "\nmax: " << max << std::endl;
    return 0;
}

Bonus

I might suggest taking the validation out of the expression and make it a free-standing function; of course, this makes things more verbose (and... legible) and my braindead sample uses global variables... -- but I trust you know how to use boost::bind or px::bind to make it more real-life

In addition to the above

  • down to 27 LOC even with the free function
  • no more phoenix, no more phoenix includes (yay compile times)
  • no more phoenix expression types in debug builds ballooning the binary and slowing it down
  • no more var, ref, if_, .else_ and the wretched operator, (which had major bug risk (at some time) due to the overload not being included with phoenix.hpp)
  • (easily ported to c++0x lambda's - immediately removing the need for global variables)

.

#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
typedef std::string::iterator It;

int min=-1, max=0, linenumber=0;
void validate_index(int index)
{
    linenumber++;
    if (min < 0)     min = index;
    if (max < index) max = index;
    else             std::cout << index << " out of order at line " << linenumber << std::endl;
}

int main(int argc, char** argv)
{
    std::string input("11 0.1\n"
            "14 0.78\n"
            "532 -3.6\n");
    It begin(input.begin()), end(input.end());

    {
        using namespace qi;

        rule<It> index = int_ [ validate_index ] ;
        rule<It> other = char_(" \t") > float_ > eol;
        std::cout << "parse result: " << std::boolalpha 
                  << qi::parse(begin, end, index % other) << std::endl; 
    }
    std::cout << "min: " << min << "\nmax: " << max << std::endl;
    return 0;
}
like image 147
sehe Avatar answered Oct 14 '22 05:10

sehe