Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it recommended to std::move a string into containers that is going to be overwritten?

I have the following code

std::vector<std::string> lines;
std::string currentLine;

while(std::getline(std::cin, currentLine)) {
  // // option 1
  // lines.push_back(std::move(currentLine));

  // // option 2
  // lines.push_back(currentLine);
}

I see different costs for the two

  1. The first approach will clear currentLine, making the getline need to allocate a new buffer for the string. But it will use the buffer for the vector instead.

  2. The second approach will make getline be able to reuse the buffer, and require a new buffer allocation for the in-vector string.

In such situations, is there a "better" way? Can the compiler optimize the one or other approach more efficiently? Or are there clever string implementations that make one option way more performant than the other?

like image 421
Johannes Schaub - litb Avatar asked Aug 20 '12 21:08

Johannes Schaub - litb


People also ask

Is std::string movable?

Yes, std::string (since C++11) is able to be moved i.e. it supports move semantics.

Is std::string contiguous?

The std::string class manages the underlying storage for you, storing your strings in a contiguous manner.

Do you need to delete strings in C++?

No. The string's destructor will be called once an instance of A goes out of scope.

Is std::string an STL container?

(C++) std::stringstd::string is an STL container for storing char. Or: 'the thing you use for storing words'. The definition of std::string is in string.


1 Answers

Given the prevalence of the short string optimization, my immediate guess is that in many cases none of this will make any difference at all -- with SSO, a move ends up copying the contained data anyway (even if the source is an rvalue so it's eligible as the source for a move).

Between the two you've given, I think I'd tend to favor the non-moving version, but I doubt it's going to make a big difference either way. Given that (most of the time) you're going to be re-using the source immediately after the move, I doubt that moving is really going to do a lot of good (even at best). Assuming SSO isn't involved, your choice is being creating a new string in the vector to hold a copy of the string you read, or move from the string you read and (in essence) create a new string to hold the next line in the next iteration. Either way, the expensive part (allocating a buffer to hold the string, copy data into that buffer) is going to be pretty much the same either way.

As far as: "is there a better way" goes, I can think of at least a couple possibilities. The most obvious would be to memory map the file, then walk through that buffer, find the ends of lines, and use emplace_back to create strings in the vector directly from the data in the buffer, with no intermediate strings at all.

That does have the minor disadvantage of memory mapping not being standardized -- if you can't live with that level of non-portability, you can read the whole file into a buffer instead of memory mapping.

The next possibility after that would be to create a class with an interface like a const string's, that just maintains a pointer to the data in the big buffer instead of making a copy of it (e.g., CLang uses something like this). This will typically reduce total allocation, heap fragmentation, etc., but if you (for example) need to modify the strings afterward, it's unlikely to be of much (if any) use.

like image 141
Jerry Coffin Avatar answered Oct 15 '22 14:10

Jerry Coffin