Right now I have code set up to divide up my string into tokens with delimiters of ,;= and space. I would also like to include the special characters as tokens.
char * cstr = new char [str.length()+1];
strcpy (cstr, str.c_str());
char * p = strtok (cstr," ");
while (p!=0)
{
whichType(p);
p = strtok(NULL," ,;=");
}
So right now if I print out the tokens of a string such as, asd sdf qwe wer,sdf;wer
it would be
asd
sdf
qwe
wer
sdf
wer
I want it to look like
asd
sdf
qwe
wer
,
sdf
;
wer
Any help would be great. Thanks
You need more flexibility. (Besides, strtok
is a bad, error prone interface).
Here's a flexible algorithm that generates tokens, copying them to an output iterator. This means you can use it to fill a container of your choice, or print it directly to an output stream (which is what I'll use as a demo).
The behaviour is specified in option flags:
enum tokenize_options
{
tokenize_skip_empty_tokens = 1 << 0,
tokenize_include_delimiters = 1 << 1,
tokenize_exclude_whitespace_delimiters = 1 << 2,
//
tokenize_options_none = 0,
tokenize_default_options = tokenize_skip_empty_tokens
| tokenize_exclude_whitespace_delimiters
| tokenize_include_delimiters,
};
Not how I actually distilled an extra requirement that you hadn't named, but your sample implies: you want the delimiters output as tokens unless they're whitespace (' '
). This is what the third option comes in for: tokenize_exclude_whitespace_delimiters
.
Now here's the real meat:
template <typename Input, typename Delimiters, typename Out>
Out tokenize(
Input const& input,
Delimiters const& delim,
Out out,
tokenize_options options = tokenize_default_options
)
{
// decode option flags
const bool includeDelim = options & tokenize_include_delimiters;
const bool excludeWsDelim = options & tokenize_exclude_whitespace_delimiters;
const bool skipEmpty = options & tokenize_skip_empty_tokens;
using namespace std;
string accum;
for(auto it = begin(input), last = end(input); it != last; ++it)
{
if (find(begin(delim), end(delim), *it) == end(delim))
{
accum += *it;
}
else
{
// output the token
if (!(skipEmpty && accum.empty()))
*out++ = accum; // optionally skip if `accum.empty()`?
// output the delimiter
bool isWhitespace = std::isspace(*it) || (*it == '\0');
if (includeDelim && !(excludeWsDelim && isWhitespace))
{
*out++ = { *it }; // dump the delimiter as a separate token
}
accum.clear();
}
}
if (!accum.empty())
*out++ = accum;
return out;
}
A full demo is Live on Ideone (default options) and Live on Coliru (no options)
int main()
{
// let's print tokens to stdout
std::ostringstream oss;
std::ostream_iterator<std::string> out(oss, "\n");
tokenize("asd sdf qwe wer,sdf;wer", " ;,", out/*, tokenize_options_none*/);
std::cout << oss.str();
// that's all, folks
}
Prints:
asd
sdf
qwe
wer
,
sdf
;
wer
I'm afraid you cannot use strtok
for that, you'll need a proper tokenizer.
If your tokens are simple, I suggest you code it manually, i.e., that you scan the string character by character. If they're not, I suggest that you take a look at several alternatives. Or, if it's really complicated, that you use a special tool like flex
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With