Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

splitting a string but keeping empty tokens c++

Tags:

c++

tokenize

I am trying to split a string and put it into a vector

however, I also want to keep an empty token whenever there are consecutive delimiter:

For example:

string mystring = "::aa;;bb;cc;;c"

I would like to tokenize this string on :; delimiters but in between delimiters such as :: and ;; I would like to push in my vector an empty string;

so my desired output for this string is:

"" (empty)
aa
"" (empty)
bb
cc
"" (empty)
c

Also my requirement is not to use the boost library.

if any could lend me an idea.

thanks

code that tokenize a string but does not include the empty tokens

void Tokenize(const string& str,vector<string>& tokens, const string& delim)
{
       // Skip delimiters at beginning.
     string::size_type lastPos = str.find_first_not_of(delimiters, 0);
     // Find first "non-delimiter".
     string::size_type pos     = str.find_first_of(delimiters, lastPos);

while (string::npos != pos || string::npos != lastPos)
 {
    // Found a token, add it to the vector.
    tokens.push_back(str.substr(lastPos, pos - lastPos));
    // Skip delimiters.  Note the "not_of"
    lastPos = str.find_first_not_of(delimiters, pos);
    // Find next "non-delimiter"
    pos = str.find_first_of(delimiters, lastPos);
  }
}
like image 578
XDProgrammer Avatar asked Jun 12 '15 07:06

XDProgrammer


People also ask

What happens if you split an empty string?

Using split() When the string is empty and no separator is specified, split() returns an array containing one empty string, rather than an empty array. If the string and separator are both empty strings, an empty array is returned.

Can you split a string in C?

In C, the strtok() function is used to split a string into a series of tokens based on a particular delimiter. A token is a substring extracted from the original string.

What does strtok do to the original string?

Breaks a character string, pointed to by string1, into a sequence of tokens. The tokens are separated from one another by the characters in the string pointed to by string2.

Why is strtok used?

Practical Application: strtok can be used to split a string in multiple strings based on some separators. A simple CSV file support might be implemented using this function. CSV files have commas as delimiters.


1 Answers

You can make your algorithm work with some simple changes. First, don't skip delimiters at the beginning, then instead of skipping delimiters in the middle of the string, just increment the position by one. Also, your npos check should ensure that both positions are not npos so it should be && instead of ||.

void Tokenize(const string& str,vector<string>& tokens, const string& delimiters)
{
    // Start at the beginning
    string::size_type lastPos = 0;
    // Find position of the first delimiter
    string::size_type pos = str.find_first_of(delimiters, lastPos);

    // While we still have string to read
    while (string::npos != pos && string::npos != lastPos)
    {
        // Found a token, add it to the vector
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        // Look at the next token instead of skipping delimiters
        lastPos = pos+1;
        // Find the position of the next delimiter
        pos = str.find_first_of(delimiters, lastPos);
    }

    // Push the last token
    tokens.push_back(str.substr(lastPos, pos - lastPos));
}
like image 105
TartanLlama Avatar answered Sep 23 '22 11:09

TartanLlama