Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the VS2008 std::string.erase() move its buffer?

I want to read a file line by line and capture one particular line of input. For maximum performance I could do this in a low level way by reading the entire file in and just iterating over its contents using pointers, but this code is not performance critical so therefore I wish to use a more readable and typesafe std library style implementation.

So what I have is this:

 std::string line;
 line.reserve(1024);
 std::ifstream file(filePath);
 while(file)
 {
    std::getline(file, line);
    if(line.substr(0, 8) == "Whatever")
    {
        // Do something ...
    }
 }

While this isn't performance critical code I've called line.reserve(1024) before the parsing operation to preclude multiple reallocations of the string as larger lines are read in.

Inside std::getline the string is erased before having the characters from each line added to it. I stepped through this code to satisfy myself that the memory wasn't being reallocated each iteration, what I found fried my brain.

Deep inside string::erase rather than just resetting its size variable to zero what it's actually doing is calling memmove_s with pointer values that would overwrite the used part of the buffer with the unused part of the buffer immediately following it, except that memmove_s is being called with a count argument of zero, i.e. requesting a move of zero bytes.

Questions:

Why would I want the overhead of a library function call in the middle of my lovely loop, especially one that is being called to do nothing at all?

I haven't picked it apart myself yet but under what circumstances would this call not actually do nothing but would in fact start moving chunks of buffer around?

And why is it doing this at all?

Bonus question: What the C++ standard library tag?

like image 857
Neutrino Avatar asked Nov 17 '11 17:11

Neutrino


People also ask

How is string erase implemented in C++?

std::string::erase in C++ The function erases a part of the string content, shortening the length of the string. The characters affected depend on the member function version used: Return value : erase() returns *this.

How many bytes is a std::string?

Example. In below example for std::string::size. The size of str is 22 bytes.

Does std::string allocate memory?

While std::string has the size of 24 bytes, it allows strings up to 22 bytes(!!) with no allocation.

How do I limit the length of a string in C++?

You can construct a string with the capacity to hold 50 characters by using: std::string str(50, '\0'); However, unlike C arrays, it is possible to increase its size by adding more data to it.


2 Answers

This is a known issue I reported a year ago, to take advantage of the fix you'll have to upgrade to a future version of the compiler.

Connect Bug: "std::string::erase is stupidly slow when erasing to the end, which impacts std::string::resize"

The standard doesn't say anything about the complexity of any std::string functions, except swap.

like image 140
Ben Voigt Avatar answered Oct 23 '22 17:10

Ben Voigt


std::string::clear() is defined in terms of std::string::erase(), and std::string::erase() does have to move all of the characters after the block which was erased. So why shouldn't it call a standard function to do so? If you've got some profiler output which proves that this is a bottleneck, then perhaps you can complain about it, but otherwise, frankly, I can't see it making a difference. (The logic necessary to avoid the call could end up costing more than the call.)

Also, you're not checking the results of the call to getline before using them. Your loop should be something like:

while ( std::getline( file, line ) ) {
    //  ...
}

And if you're so worried about performance, creating a substring (a new std::string) just in order to do a comparison is far more expensive than a call to memmove_s. What's wrong with something like:

static std::string const target( "Whatever" );
if ( line.size() >= target.size()
        && std::equal( target.begin(), target().end(), line.being() ) ) {
    //  ...
}

I'ld consider this the most idiomatic way of determining whether a string starts with a specific value.

(I might add that from experience, the reserve here doesn't buy you much either. After you've read a couple of lines in the file, your string isn't going to grow much anyway, so there'll be very few reallocations after the first couple of lines. Another case of premature optimization?)

like image 3
James Kanze Avatar answered Oct 23 '22 16:10

James Kanze