Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boost spirit is too greedy

I'm in between a deep admiration about boost::spirit and eternal frustration not to understand it ;)

I have problems with strings that are too greedy and therefore it doesn't match. Below a minimal example that doesn't parse as the txt rule eats up end.

More information about what i'd like to do : the goal is to parse some pseudo-SQL and I skip whitespaces. In a statement like

select foo.id, bar.id from foo, baz 

I need to treat from as a special keyword. The rule is something like

"select" >> txt % ',' >> "from" >> txt % ',' 

but it obviously doesn't work at it sees bar.id from foo as one item.

#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
int main(int, char**) {
    auto txt = +(qi::char_("a-zA-Z_"));
    auto rule = qi::lit("Hello") >> txt % ',' >> "end";
    std::string str = "HelloFoo,Moo,Bazend";
    std::string::iterator begin = str.begin();
    if (qi::parse(begin, str.end(), rule))
        std::cout << "Match !" << std::endl;
    else
        std::cout << "No match :'(" << std::endl;
}
like image 658
Tristram Gräbener Avatar asked Mar 18 '11 17:03

Tristram Gräbener


1 Answers

Here's my version, with changes marked:

#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
int main(int, char**) {
  auto txt = qi::lexeme[+(qi::char_("a-zA-Z_"))];     // CHANGE: avoid eating spaces
  auto rule = qi::lit("Hello") >> txt % ',' >> "end";
  std::string str = "Hello Foo, Moo, Baz end";        // CHANGE: re-introduce spaces
  std::string::iterator begin = str.begin();
  if (qi::phrase_parse(begin, str.end(), rule, qi::ascii::space)) {          // CHANGE: used phrase_parser with a skipper
    std::cout << "Match !" << std::endl << "Remainder (should be empty): '"; // CHANGE: show if we parsed the whole string and not just a prefix
    std::copy(begin, str.end(), std::ostream_iterator<char>(std::cout));
    std::cout << "'" << std::endl;
  }
  else {
    std::cout << "No match :'(" << std::endl;
  }
}

This compiles and runs with GCC 4.4.3 and Boost 1.4something; output:

Match !
Remainder (should be empty): ''

By using lexeme, you can avoid eating spaces conditionally, so that txt matches up to a word boundary only. This yields the desired result: because "Baz" is not followed by a comma, and txt doesn't eat spaces, we never accidentally consume "end".

Anyway, I'm not 100% sure this is what you're looking for -- in particular, is str missing spaces as an illustrative example, or are you somehow forced to use this (spaceless) format?

Side note: if you want to make sure that you've parsed the entire string, add a check to see if begin == str.end(). As stated, your code will report a match even if only a non-empty prefix of str was parsed.

Update: Added suffix printing.

like image 107
phooji Avatar answered Sep 28 '22 05:09

phooji