(Disclaimer: I don't know what the C++ standard might say about this..I know, I'm horrible)
while operating on very large strings I noticed that std::string is using copy-on-write. I managed to write the smallest loop that would reproduce the observed behaviour and the following one, for instance, runs suspiciously fast:
#include <string>
using std::string;
int main(void) {
string basestr(1024 * 1024 * 10, 'A');
for (int i = 0; i < 100; i++) {
string a_copy = basestr;
}
}
when adding a write in the loop body a_copy[1] = 'B';
, an actual copy apparently took place, and the program ran in 0.3s instead of a few milliseconds. 100 writes slowed it down by about 100 times.
But then it got weird. Some of my strings weren't written to, only read from, and this was not reflected in the execution time, which was almost exactly proportional to the number of operations on the strings. With some digging, I found that simply reading from a string still gave me that performance hit, so it led me to assume GNU STL strings are using copy-on-read (?).
#include <string>
using std::string;
int main(void) {
string basestr(1024 * 1024 * 10, 'A');
for (int i = 0; i < 100; i++) {
string a_copy = basestr;
a_copy[99]; // this also ran in 0.3s!
}
}
After revelling in my discovery for a while, I found out that reading (with operator[]) from the base string also takes 0.3s for the entire toy program..I'm not 100% comfortable with this. Are STL strings indeed copy-on-read, or are they allowing copy-on-write at all? I'm led to think that operator[] has some safeguards against one who would keep the reference it returns and later write to it; is this really the case? If not, what is really happening? If someone can point to some relevant section in the C++ standard, that'd also be appreciated.
For reference, I'm using g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3
, and the GNU STL.
COW, short for copy on write, is a way to implement mutable strings so that creating strings and logically copying strings, is reduced to almost nothing; conceptually they become free operations like no-ops.
To implement copy-on-write, a smart pointer to the real content is used to encapsulate the object's value, and on each modification an object reference count is checked; if the object is referenced more than once, a copy of the content is created before modification.
The c++ solution for strings are quite different from the c-version. The first and most important difference is while the c using the ASCIIZ solution, the std::string and std::wstring are using two iterators (pointers) to store the actual string.
So, yes it is ref counted. Also, from the discussion here: Yes, std::string will be made non-reference counting at some point, but as a non-reference-counted string is valid in C++98 as well, one option would be to switch to a non-ref-counted string for both -std=c++98 and -std=c++11 modes.
C++ doesn't distinguish between the operator[]
for reading and writing, but only the operator[]
for const object and mutable (non-const) object. Since a_copy
is mutable, the mutable operator[]
will be chosen, which forces the copying because that operator returns a (mutable) reference.
If efficiency is a concern, you could cast the a_copy
to a const string
to force the const
version of operator[]
to be used, which won't make a copy of the internal buffer.
char f = static_cast<const string>(a_copy)[99];
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With