Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String Tokenizer with multiple delimiters including delimiter without Boost

I need to create string parser in C++. I tried using

vector<string> Tokenize(const string& strInput, const string& strDelims)
{
 vector<string> vS;

 string strOne = strInput;
 string delimiters = strDelims;

 int startpos = 0;
 int pos = strOne.find_first_of(delimiters, startpos);

 while (string::npos != pos || string::npos != startpos)
 {
  if(strOne.substr(startpos, pos - startpos) != "")
   vS.push_back(strOne.substr(startpos, pos - startpos));

  // if delimiter is a new line (\n) then add new line
  if(strOne.substr(pos, 1) == "\n")
   vS.push_back("\\n");
  // else if the delimiter is not a space
  else if (strOne.substr(pos, 1) != " ")
   vS.push_back(strOne.substr(pos, 1));

  if( string::npos == strOne.find_first_not_of(delimiters, pos) )
   startpos = strOne.find_first_not_of(delimiters, pos);
  else
   startpos = pos + 1;

        pos = strOne.find_first_of(delimiters, startpos);

 }

 return vS;
}

This works for 2X+7cos(3Y)

(tokenizer("2X+7cos(3Y)","+-/^() \t");)

But gives a runtime error for 2X

I need non Boost solution.

I tried using C++ String Toolkit (StrTk) Tokenizer

std::vector<std::string> results;
strtk::split(delimiter, source,
             strtk::range_to_type_back_inserter(results),
             strtk::tokenize_options::include_all_delimiters);

 return results; 

but it doesn't give token as a separate string.

eg: if I give the input as 2X+3Y

output vector contains

2X+

3Y

like image 410
user2473015 Avatar asked Jul 01 '15 04:07

user2473015


People also ask

How do I use multiple delimiters in Java string Tokenizer?

In order to break String into tokens, you need to create a StringTokenizer object and provide a delimiter for splitting strings into tokens. You can pass multiple delimiters e.g. you can break String into tokens by, and: at the same time. If you don't provide any delimiter then by default it will use white-space.

How do you split a string by two delimiters?

To split a string with multiple delimiters: Use the str. replace() method to replace the first delimiter with the second. Use the str. split() method to split the string by the second delimiter.

Can you have multiple delimiters Java?

we can use multiple possible characters as delimiters: for this, we have to separate them with a |. For example, if we want to split input between every white space and every line break, we'll use the following delimiter: “\n|\\s”

What is StringTokenizer delimiter?

Constructs a string tokenizer for the specified string. The tokenizer uses the default delimiter set, which is " \t\n\r\f" : the space character, the tab character, the newline character, the carriage-return character, and the form-feed character.


2 Answers

What's probably happening is this is crashing when passed npos:

lastPos = str.find_first_not_of(delimiters, pos);

Just add breaks to your loop instead of relying on the while clause to break out of it.

if (pos == string::npos)
  break;
lastPos = str.find_first_not_of(delimiters, pos);

if (lastPos == string::npos)
  break;
pos = str.find_first_of(delimiters, lastPos);
like image 110
QuestionC Avatar answered Oct 14 '22 18:10

QuestionC


Loop exit condition is broken:

while (string::npos != pos || string::npos != startpos)

Allows entry with, say pos = npos and startpos = 1.

So

strOne.substr(startpos, pos - startpos)
strOne.substr(1, npos - 1)

end is not npos, so substr doesn't stop where it should and BOOM!

If pos = npos and startpos = 0,

strOne.substr(startpos, pos - startpos)

lives, but

strOne.substr(pos, 1) == "\n"
strOne.substr(npos, 1) == "\n"

dies. So does

strOne.substr(pos, 1) != " "

Sadly I'm out of time and can't solve this right now, but QuestionC's got the right idea. Better filtering. Something along the lines of:

    if (string::npos != pos)
    {
        if (strOne.substr(pos, 1) == "\n") // can possibly simplify this with strOne[pos] == '\n'
            vS.push_back("\\n");
        // else if the delimiter is not a space
        else if (strOne[pos] != ' ')
            vS.push_back(strOne.substr(pos, 1));
    }
like image 43
user4581301 Avatar answered Oct 14 '22 18:10

user4581301