Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

performance overhead of c++ string tokenize via istringstream

I would like to know what's the performance overhead of

string line, word;
while (std::getline(cin, line))
{
    istringstream istream(line);
    while (istream >> word)
        // parse word here
}

I think this is the standard c++ way to tokenize input.

To be specific:

  • Does each line copied three times, first via getline, then via istream constructor, last via operator>> for each word?
  • Would frequent construction & destruction of istream be an issue? What's the equivalent implementation if I define istream before the outer while loop?

Thanks!

Update:

An equivalent implementation

string line, word;
stringstream stream;
while (std::getline(cin, line))
{
    stream.clear();
    stream << line;
    while (stream >> word)
        // parse word here
}

uses a stream as a local stack, that pushes lines, and pops out words. This would get rid of possible frequent constructor & destructor call in the previous version, and utilize stream internal buffering effect (Is this point correct?).

Alternative solutions, might be extends std::string to support operator<< and operator>>, or extends iostream to support sth. like locate_new_line. Just brainstorming here.

like image 783
csyangchen Avatar asked Jun 09 '12 12:06

csyangchen


1 Answers

Unfortunately, iostreams is not for performance-intensive work. The problem is not copying things in memory (copying strings is fast), it's virtual function dispatches, potentially to the tune of several indirect function calls per character.

As for your question about copying, yes, as written everything gets copied when you initialize a new stringstream. (Characters also get copied from the stream to the output string by getline or >>, but that obviously can't be prevented.)

Using C++11's move facility, you can eliminate the extraneous copies:

string line, word;
while (std::getline(cin, line)) // initialize line
{       // move data from line into istream (so it's no longer in line):
    istringstream istream( std::move( line ) );
    while (istream >> word)
        // parse word here
}

All that said, performance is only an issue if a measurement tool tells you it is. Iostreams is flexible and robust, and filebuf is basically fast enough, so you can prototype the code so it works and then optimize the bottlenecks without rewriting everything.

like image 127
Potatoswatter Avatar answered Nov 09 '22 11:11

Potatoswatter