Splitting a string using strtok() in C In C, the strtok() function is used to split a string into a series of tokens based on a particular delimiter. A token is a substring extracted from the original string.
String parsing in java can be done by using a wrapper class. Using the Split method, a String can be converted to an array by passing the delimiter to the split method. The split method is one of the methods of the wrapper class. String parsing can also be done through StringTokenizer.
Fortunately, the C programming language has a standard C library function to do just that. The strtok function breaks up a line of data according to "delimiters" that divide each field. It provides a streamlined way to parse data from an input string.
To parse, in computer science, is where a string of commands – usually a program – is separated into more easily processed components, which are analyzed for correct syntax and then attached to tags that define each component. The computer can then process each program chunk and transform it into machine language.
This is a try using only standard C++.
Most of the time I use a combination of std::istringstream and std::getline (which can work to separate words) to get what I want. And if I can I make my config files look like:
foo=1,2,3,4
which makes it easy.
text file is like this:
foo=1,2,3,4
bar=0
And you parse it like this:
int main()
{
std::ifstream file( "sample.txt" );
std::string line;
while( std::getline( file, line ) )
{
std::istringstream iss( line );
std::string result;
if( std::getline( iss, result , '=') )
{
if( result == "foo" )
{
std::string token;
while( std::getline( iss, token, ',' ) )
{
std::cout << token << std::endl;
}
}
if( result == "bar" )
{
//...
}
}
The C++ String Toolkit Library (StrTk) has the following solution to your problem:
#include <string>
#include <deque>
#include "strtk.hpp"
int main()
{
std::string file_name = "simple.txt";
strtk::for_each_line(file_name,
[](const std::string& line)
{
std::deque<std::string> token_list;
strtk::parse(line,"[]: ",token_list);
if (token_list.empty()) return;
const std::string& key = token_list[0];
if (key == "foo")
{
//do 'foo' related thing with token_list[1]
//and token_list[2]
return;
}
if (key == "bar")
{
//do 'bar' related thing with token_list[1]
return;
}
});
return 0;
}
More examples can be found Here
Boost.Spirit is not reserved to parse complicated structure. It is quite good at micro-parsing too, and almost match the compactness of the C + scanf snippet :
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <sstream>
using namespace boost::spirit::qi;
int main()
{
std::string text = "foo: [3 4 5]\nbaz: 3.0";
std::istringstream iss(text);
std::string line;
while (std::getline(iss, line))
{
int x, y, z;
if(phrase_parse(line.begin(), line.end(), "foo: [">> int_ >> int_ >> int_ >> "]", space, x, y, z))
continue;
float w;
if(phrase_parse(line.begin(), line.end(), "baz: ">> float_, space , w))
continue;
}
}
(Why they didn't add a "container" version is beyond me, it would be much more convenient if we could just write :
if(phrase_parse(line, "foo: [">> int_ >> int_ >> int_ >> "]", space, x, y, z))
continue;
But it's true that :
So ultimately, for simple parsing I use scanf like everyone else...
Regular expressions can often be used for parsing strings. Use capture groups
(parentheses) to get the various parts of the line being parsed.
For instance, to parse an expression like foo: [3 4 56]
, use the regular expression (.*): \[(\d+) (\d+) (\d+)\]
. The first capture group will contain "foo", the second, third and fourth will contain the numbers 3, 4 and 56.
If there are several possible string formats that need to be parsed, like in the example given by the OP, either apply separate regular expressions one by one and see which one matches, or write a regular expression that matches all the possible variations, typically using the |
(set union) operator.
Regular expressions are very flexible, so the expression can be extended to allow more variations, for instance, an arbitrary number of spaces and other whitespace after the :
in the example. Or to only allow the numbers to contain a certain number of digits.
As an added bonus, regular expressions provide an implicit validation since they require a perfect match. For instance, if the number 56
in the example above was replaced with 56x
, the match would fail. This can also simplify code as, in the example above, the groups containing the numbers can be safely cast to integers without any additional checking being required after a successful match.
Regular expressions usually run at good performance and there are many good libraries to chose from. For instance, Boost.Regex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With