Simple string parsing with C++

Tags:

c++

People also ask

How do you parse a string in C?

Splitting a string using strtok() in C In C, the strtok() function is used to split a string into a series of tokens based on a particular delimiter. A token is a substring extracted from the original string.

How do you parse a string?

String parsing in java can be done by using a wrapper class. Using the Split method, a String can be converted to an array by passing the delimiter to the split method. The split method is one of the methods of the wrapper class. String parsing can also be done through StringTokenizer.

Can you parse in C?

Fortunately, the C programming language has a standard C library function to do just that. The strtok function breaks up a line of data according to "delimiters" that divide each field. It provides a streamlined way to parse data from an input string.

What is parsing function in C?

To parse, in computer science, is where a string of commands – usually a program – is separated into more easily processed components, which are analyzed for correct syntax and then attached to tags that define each component. The computer can then process each program chunk and transform it into machine language.

This is a try using only standard C++.

Most of the time I use a combination of std::istringstream and std::getline (which can work to separate words) to get what I want. And if I can I make my config files look like:

foo=1,2,3,4

which makes it easy.

text file is like this:

foo=1,2,3,4
bar=0

And you parse it like this:

int main()
{
    std::ifstream file( "sample.txt" );

    std::string line;
    while( std::getline( file, line ) )   
    {
        std::istringstream iss( line );

        std::string result;
        if( std::getline( iss, result , '=') )
        {
            if( result == "foo" )
            {
                std::string token;
                while( std::getline( iss, token, ',' ) )
                {
                    std::cout << token << std::endl;
                }
            }
            if( result == "bar" )
            {
               //...
    }
}

The C++ String Toolkit Library (StrTk) has the following solution to your problem:

#include <string>
#include <deque>
#include "strtk.hpp"

int main()
{
   std::string file_name = "simple.txt";
   strtk::for_each_line(file_name,
                       [](const std::string& line)
                       {
                          std::deque<std::string> token_list;
                          strtk::parse(line,"[]: ",token_list);
                          if (token_list.empty()) return;

                          const std::string& key = token_list[0];

                          if (key == "foo")
                          {
                            //do 'foo' related thing with token_list[1] 
                            //and token_list[2]
                            return;
                          }

                          if (key == "bar")
                          {
                            //do 'bar' related thing with token_list[1]
                            return;
                          }

                       });

   return 0;
}

More examples can be found Here

Boost.Spirit is not reserved to parse complicated structure. It is quite good at micro-parsing too, and almost match the compactness of the C + scanf snippet :

#include <boost/spirit/include/qi.hpp>
#include <string>
#include <sstream>

using namespace boost::spirit::qi;


int main()
{
   std::string text = "foo: [3 4 5]\nbaz: 3.0";
   std::istringstream iss(text);

   std::string line;
   while (std::getline(iss, line))
   {
      int x, y, z;
      if(phrase_parse(line.begin(), line.end(), "foo: [">> int_ >> int_ >> int_ >> "]", space, x, y, z))
         continue;
      float w;
      if(phrase_parse(line.begin(), line.end(), "baz: ">> float_, space , w))
         continue;
   }
}

(Why they didn't add a "container" version is beyond me, it would be much more convenient if we could just write :

if(phrase_parse(line, "foo: [">> int_ >> int_ >> int_ >> "]", space, x, y, z))
   continue;

But it's true that :

It adds a lot of compile time overhead.
Error messages are brutal. If you make a small mistake with scanf, you just run your program and immediately get a segfault or an absurd parsed value. Make a small mistake with spirit and you will get hopeless gigantic error messages from the compiler and it takes a LOT of practice with boost.spirit to understand them.

So ultimately, for simple parsing I use scanf like everyone else...

Regular expressions can often be used for parsing strings. Use capture groups (parentheses) to get the various parts of the line being parsed.

For instance, to parse an expression like foo: [3 4 56], use the regular expression (.*): \[(\d+) (\d+) (\d+)\]. The first capture group will contain "foo", the second, third and fourth will contain the numbers 3, 4 and 56.

If there are several possible string formats that need to be parsed, like in the example given by the OP, either apply separate regular expressions one by one and see which one matches, or write a regular expression that matches all the possible variations, typically using the | (set union) operator.

Regular expressions are very flexible, so the expression can be extended to allow more variations, for instance, an arbitrary number of spaces and other whitespace after the : in the example. Or to only allow the numbers to contain a certain number of digits.

As an added bonus, regular expressions provide an implicit validation since they require a perfect match. For instance, if the number 56 in the example above was replaced with 56x, the match would fail. This can also simplify code as, in the example above, the groups containing the numbers can be safely cast to integers without any additional checking being required after a successful match.

Regular expressions usually run at good performance and there are many good libraries to chose from. For instance, Boost.Regex.

Related questions
                            
                                What is the purpose of a declaration like int (x); or int (x) = 10;
                            
                                Good C++ string manipulation library
                            
                                Why use variadic arguments now when initializer lists are available?
                            
                                Why is returning a reference to a function local value not a compile error?
                            
                                Mechanism to check if a C++ member is private
                            
                                Is it required to add 'extern C' in source file also?
                            
                                Using static mutex in a class
                            
                                Constructor chaining in C++
                            
                                Cannot open Windows.h in Microsoft Visual Studio
                            
                                Disconnecting lambda functions in Qt5
                            
                                Why does parameter pack expansion work differently with different C++ compilers?
                            
                                Is gcc or clang correct about this behavior?
                            
                                Does "const" just mean read-only or something more?
                            
                                Using The [] Operator Efficiently With C++ unordered_map
                            
                                C++11 mode or settings for emacs?
                            
                                Is it possible to declare a friend function as static?
                            
                                C/C++ use of int or unsigned int
                            
                                Making a user-defined class std::to_string-able
                            
                                Are move constructors produced automatically?
                            
                                What header should I include for memcpy and realloc?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Simple string parsing with C++

Tags:

c++

People also ask

Related questions

Recent Activity

Donate For Us