Returned value The first time strtok() is called, it returns a pointer to the first token in string1. In later calls with the same token string, strtok() returns a pointer to the next token in the string. A NULL pointer is returned when there are no more tokens. All tokens are NULL-terminated.
The strtok() function in C++ returns the next token in a C-string (null terminated byte string). "Tokens" are smaller chunks of the string that are separated by a specified character, called the delimiting character.
Because strtok() modifies the initial string to be parsed, the string is subsequently unsafe and cannot be used in its original form. If you need to preserve the original string, copy it into a buffer and pass the address of the buffer to strtok() instead of the original string.
The common solution to tokenize a string in C++ is using std::istringstream , which is a stream class to operate on strings. The following code extract tokens from the stream using the extraction operator and insert them into a container.
#include <iostream>
#include <string>
#include <sstream>
int main(){
std::string myText("some-text-to-tokenize");
std::istringstream iss(myText);
std::string token;
while (std::getline(iss, token, '-'))
{
std::cout << token << std::endl;
}
return 0;
}
Or, as mentioned, use boost for more flexibility.
If boost is available on your system (I think it's standard on most Linux distros these days), it has a Tokenizer class you can use.
If not, then a quick Google turns up a hand-rolled tokenizer for std::string that you can probably just copy and paste. It's very short.
And, if you don't like either of those, then here's a split() function I wrote to make my life easier. It'll break a string into pieces using any of the chars in "delim" as separators. Pieces are appended to the "parts" vector:
void split(const string& str, const string& delim, vector<string>& parts) {
size_t start, end = 0;
while (end < str.size()) {
start = end;
while (start < str.size() && (delim.find(str[start]) != string::npos)) {
start++; // skip initial whitespace
}
end = start;
while (end < str.size() && (delim.find(str[end]) == string::npos)) {
end++; // skip to end of word
}
if (end-start != 0) { // just ignore zero-length strings.
parts.push_back(string(str, start, end-start));
}
}
}
Duplicate the string, tokenize it, then free it.
char *dup = strdup(str.c_str());
token = strtok(dup, " ");
free(dup);
There is a more elegant solution.
With std::string you can use resize() to allocate a suitably large buffer, and &s[0] to get a pointer to the internal buffer.
At this point many fine folks will jump and yell at the screen. But this is the fact. About 2 years ago
the library working group decided (meeting at Lillehammer) that just like for std::vector, std::string should also formally, not just in practice, have a guaranteed contiguous buffer.
The other concern is does strtok() increases the size of the string. The MSDN documentation says:
Each call to strtok modifies strToken by inserting a null character after the token returned by that call.
But this is not correct. Actually the function replaces the first occurrence of a separator character with \0. No change in the size of the string. If we have this string:
one-two---three--four
we will end up with
one\0two\0--three\0-four
So my solution is very simple:
std::string str("some-text-to-split");
char seps[] = "-";
char *token;
token = strtok( &str[0], seps );
while( token != NULL )
{
/* Do your thing */
token = strtok( NULL, seps );
}
Read the discussion on http://www.archivum.info/comp.lang.c++/2008-05/02889/does_std::string_have_something_like_CString::GetBuffer
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With