Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are textual data files parsed in modern C++?

I am (too) often confronted with the task of having to parse textual data files -- the kind of textual structured data representation you used before "everyone" used XML -- that are some kind of industry standard. (There are too many of these.)

Anyways, the basic task is always taking a text file and stuffing what's in there in some kind of datastructure so that our C++ code can do something with the info.

Now, I have implemented a few simple (and oh so buggy) parsers by hand, and there is little I despise more. :-)

So - I was wondering what the current state of the art is when I want to "parse" structured textual data into a in-memory representation (think: XML data binding for an arbitrary language).

What I found so far was "What parser generator do you recommend", but I'm not so sure I'm after a parser generator (like ANTLR).

Obvious candidates seem to be pegtl and Boost.Spirit but they both seem rather complicated (but at least they're in-language) and last time I tried Spirit, the compiler errors drove me nuts. (And pegtl needs a C++11 compatible compiler which is still a problem here (VC++ 2005).)

So am I missing a simpler solution for just getting something like

/begin COMPU_METHOD
  DEC "  Decimal value"
  RAT_FUNC
  "%3.0"
  "dec"
  COEFFS 0 1.000000 0.000000 0 0.000000 1.000000
/end COMPU_METHOD

into a C++ datastructure? (This is just an arbitrary example of how part of such a file may look. For this format I could (and probably should) buy a library to parse it, as it is widespread enough -- which is not the case for all formats I encounter.)

-- or should I just go for the complexity of, say Boost.Spirit?

like image 960
Martin Ba Avatar asked Nov 04 '11 09:11

Martin Ba


1 Answers

  • Boost Spirit

    See

    • my answer here for a demo that resembles your sample;
    • a more advanced, shorter demo here that parses into a tree structure
    • more samples search
  • Coco/R (C++)

    I have had good results with this very pragmatic parser generator that supports many lnaguages/platforms using a common grammar format. The speed of parsing is comparable to Boost Spirit (allthough the processing of parsed data may be more efficient using generic programming)

Edit To make things perfectly clear, there never has been a thing that I wasn't able to do with Coco/R.

However, I'm really addicted to the ease with which Spirit deduces attribute type (conversions) for me generically. That is the main timesaver. There is a cost involved though:

  • learning curve, maintenance
  • compile time (but parsers don't often change)
like image 130
sehe Avatar answered Oct 03 '22 11:10

sehe