Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Legality of COW std::string implementation in C++11

People also ask

Can you use std::string in C?

The std::string class manages the underlying storage for you, storing your strings in a contiguous manner. You can get access to this underlying buffer using the c_str() member function, which will return a pointer to null-terminated char array. This allows std::string to interoperate with C-string APIs.

What does std::string () do?

std::string class in C++ C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.

Is std::string the same as string?

There is no functionality difference between string and std::string because they're the same type.

Is std::string movable?

Yes, std::string (since C++11) is able to be moved i.e. it supports move semantics.


It's not allowed, because as per the standard 21.4.1 p6, invalidation of iterators/references is only allowed for

— as an argument to any standard library function taking a reference to non-const basic_string as an argument.

— Calling non-const member functions, except operator[], at, front, back, begin, rbegin, end, and rend.

For a COW string, calling non-const operator[] would require making a copy (and invalidating references), which is disallowed by the paragraph above. Hence, it's no longer legal to have a COW string in C++11.


The answers by Dave S and gbjbaanb are correct. (And Luc Danton's is correct too, although it's more a side-effect of forbidding COW strings rather than the original rule that forbids it.)

But to clear up some confusion, I'm going to add some further exposition. Various comments link to a comment of mine on the GCC bugzilla which gives the following example:

std::string s("str");
const char* p = s.data();
{
    std::string s2(s);
    (void) s[0];
}
std::cout << *p << '\n';  // p is dangling

The point of that example is to demonstrate why GCC's reference counted (COW) string is not valid in C++11. The C++11 standard requires this code to work correctly. Nothing in the code permits the p to be invalidated in C++11.

Using GCC's old reference-counted std::string implementation, that code has undefined behaviour, because p is invalidated, becoming a dangling pointer. (What happens is that when s2 is constructed it shares the data with s, but obtaining a non-const reference via s[0] requires the data to be unshared, so s does a "copy on write" because the reference s[0] could potentially be used to write into s, then s2 goes out of scope, destroying the array pointed to by p).

The C++03 standard explicitly permits that behaviour in 21.3 [lib.basic.string] p5 where it says that subsequent to a call to data() the first call to operator[]() may invalidate pointers, references and iterators. So GCC's COW string was a valid C++03 implementation.

The C++11 standard no longer permits that behaviour, because no call to operator[]() may invalidate pointers, references or iterators, irrespective of whether they follow a call to data().

So the example above must work in C++11, but does not work with libstdc++'s kind of COW string, therefore that kind of COW string is not permitted in C++11.


It is, CoW is an acceptable mechanism for making faster strings... but...

it makes multithreading code slower (all that locking to check if you're the only one writing kills performance when using a lot of strings). This was the main reason CoW was killed off years ago.

The other reasons are that the [] operator will return you the string data, without any protection for you to overwrite a string someone else expects to be unchanging. The same applies to c_str() and data().

Quick google says that the multithreading is basically the reason it was effectively disallowed (not explicitly).

The proposal says :

Proposal

We propose to make all iterator and element access operations safely concurrently executable.

We are increasing the stability of operations even in sequential code.

This change effectively disallows copy-on-write implementations.

followed by

The largest potential loss in performance due to a switch away from copy-on-write implementations is the increased consumption of memory for applications with very large read-mostly strings. However, we believe that for those applications ropes are a better technical solution, and recommend a rope proposal be considered for inclusion in Library TR2.

Ropes are part of STLPort and SGIs STL.


From 21.4.2 basic_string constructors and assignment operators [string.cons]

basic_string(const basic_string<charT,traits,Allocator>& str);

[...]

2 Effects: Constructs an object of class basic_string as indicated in Table 64. [...]

Table 64 helpfully documents that after construction of an object via this (copy) constructor, this->data() has as value:

points at the first element of an allocated copy of the array whose first element is pointed at by str.data()

There are similar requirements for other similar constructors.