Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ string parsing ideas

I have the output of another program that was more intended to be human readable than machine readable, but yet am going to parse it anyway. It's nothing too complex.

Yet, I'm wondering what the best way to do this in C++ is. This is more of a 'general practice' type of question.

I looked into Boost.Spirit, and even got it working a bit. That thing is crazy! If I was designing the language that I was reading, it might be the right tool for the job. But as it is, given its extreme compile-times, the several pages of errors from g++ when I do anything wrong, it's just not what I need. (I don't have much need for run-time performance either.)

Thinking about using C++ operator <<, but that seems worthless. If my file has lines like "John has 5 widgets", and others "Mary works at 459 Ramsy street" how can I even make sure I have a line of the first type in my program, and not the second type? I have to read the whole line and then use things like string::find and string::substr I guess.

And that leaves sscanf. It would handle the above cases beautifully

if( sscanf( str, "%s has %d widgets", chararr, & intvar ) == 2 )
      // then I know I matched "foo has bar" type of string, 
      // and I now have the parameters too

So I'm just wondering if I'm missing something or if C++ really doesn't have much built-in alternative.

like image 973
Scott Avatar asked Feb 14 '11 03:02

Scott


1 Answers

sscanf does indeed sound like a pretty good fit for your requirements:

  • you may do some redundant parsing, but you don't have performance requirements prohibiting that
  • it localises the requirements on the different input words and allows parsing of non-string values directly into typed variables, making the different input formats easy to understand

A potential problem is that it's error prone, and if you have lots of oft-changing parsing phrases then the testing effort and risk can be worrying. Keeping the spirit of sscanf but using istream for type safety:

#include <iostream>
#include <sstream>

// Str captures a string literal and consumes the same from an istream...
// (for non-literals, better to have `std::string` member to guarantee lifetime)
class Str
{
  public:
    Str(const char* p) : p_(p) { }
    const char* c_str() const { return p_; }
  private:
    const char* p_;
};

bool operator!=(const Str& lhs, const Str& rhs)
{
    return strcmp(lhs.c_str(), rhs.c_str()) != 0;
}

std::istream& operator>>(std::istream& is, const Str& str)
{
    std::string s;
    if (is >> s)
        if (s.c_str() != str)
            is.setstate(std::ios_base::failbit);
    return is;
}

// sample usage...

int main()
{
    std::stringstream is("Mary has 4 cats");
    int num_dogs, num_cats;

    if (is >> Str("Mary") >> Str("has") >> num_dogs >> Str("dogs"))
    {
        std::cout << num_dogs << " dogs\n";
    }
    else if (is.clear(), is.seekg(0), // "reset" the stream...
             (is >> Str("Mary") >> Str("has") >> num_cats >> Str("cats")))
    {
        std::cout << num_cats << " cats\n";
    }
}
like image 91
Tony Delroy Avatar answered Oct 12 '22 12:10

Tony Delroy