Today I passed by this SO question: Legality of COW std::string implementation in C++11
The most voted answer (35 upvotes) for that question says:
It's not allowed, because as per the standard 21.4.1 p6, invalidation of iterators/references is only allowed for
— as an argument to any standard library function taking a reference to non-const basic_string as an argument.
— Calling non-const member functions, except operator[], at, front, back, begin, rbegin, end, and rend.
For a COW string, calling non-const operator[] would require making a copy (and invalidating references), which is disallowed by the paragraph above. Hence, it's no longer legal to have a COW string in C++11.
I wonder whether that justification is valid or not because it seems C++03 has similar requirements for string iterator invalidation:
References, pointers, and iterators referring to the elements of a basic_string sequence may be invalidated by the following uses of that basic_string object:
- As an argument to non-member functions swap() (21.3.7.8), operator>>() (21.3.7.9), and getline() (21.3.7.9).
- As an argument to basic_string::swap().
- Calling data() and c_str() member functions.
- Calling non-const member functions, except operator[](), at(), begin(), rbegin(), end(), and rend().
- Subsequent to any of the above uses except the forms of insert() and erase() which return iterators, the first call to non-const member functions operator[](), at(), begin(), rbegin(), end(), or rend().
These are not exactly the same as those of C++11's, but at least the same for the part of operator[]()
, which the original answer took as the major justification. So I guess, in order to justify the illegality of COW std::string implementation in C++11, some other standard requirements need to be cited. Help needed here.
That SO question has been inactive for over a year, so I've decided to raise this as a separate question. Please let me know if this is inappropriate and I will find some other way to clear my doubt.
The std::string class manages the underlying storage for you, storing your strings in a contiguous manner. You can get access to this underlying buffer using the c_str() member function, which will return a pointer to null-terminated char array. This allows std::string to interoperate with C-string APIs.
The c++ solution for strings are quite different from the c-version. The first and most important difference is while the c using the ASCIIZ solution, the std::string and std::wstring are using two iterators (pointers) to store the actual string.
C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.
There is no functionality difference between string and std::string because they're the same type.
The key point is the last point in the C++03 standard. The
wording could be a lot clearer, but the intent is that the first
call to []
, at
, etc. (but only the first call) after
something which established new iterators (and thus invalidated
old ones) could invalidate iterators, but only the first. The
wording in C++03 was, in fact, a quick hack, inserted in
response to comments by the French national body on the CD2 of
C++98. The original problem is simple: consider:
std::string a( "some text" );
std::string b( a );
char& rc = a[2];
At this point, modifications through rc
must affect a
, but
not b
. If COW is being used, however, when a[2]
is called,
a
and b
share a representation; in order for writes through
the returned reference not to affect b
, a[2]
must be
considered a "write", and be allowed to invalidate the
reference. Which is what CD2 said: any call to a non-const
[]
, at
, or one of the begin
or end
functions could
invalidate iterators and references. The French national body
comments pointed out that this rendered a[i] == a[j]
invalid,
since the reference returned by one of the []
would be
invalidated by the other. The last point you cite of C++03 was
added to circumvent this—only the first call to []
et
al. could invalidate the iterators.
I don't think anyone was totally happy with the results. The
wording was done quickly, and while the intent was clear to
those who were aware of the history, and the original problem,
I don't think it was fully clear from standard. In addition,
some experts began to question the value of COW to begin with,
given the relative impossibility of the string class itself to
reliably detect all writes. (If a[i] == a[j]
is the complete
expression, there is no write. But the string class itself must
assume that the return value of a[i]
may result in a write.)
And in a multi-threaded environment, the cost of managing the
reference count needed for copy on write was deemed a relatively
high cost for something you usually don't need. The result is
that most implementations (which supported threading long before
C++11) have been moving away from COW anyway; as far as I know,
the only major implementation still using COW was g++ (but there
was a known bug in their multithreaded implementation) and
(maybe) Sun CC (which the last time I looked at it, was
inordinately slow, because of the cost of managing the counter).
I think the committee simply took what seemed to them the
simplest way of cleaning things up, by forbidding COW.
Some more clarification with regards to why a COW implementation
has to invalidate iterators on the first call to []
. Consider
a naïve implementation of COW. (I will just call it String, and
ignore all of the issues involving traits and allocators, which
aren't really relevant here. I'll also ignore exception and
thread safety, just to make things as simple as possible.)
class String
{
struct StringRep
{
int useCount;
size_t size;
char* data;
StringRep( char const* text, size_t size )
: useCount( 1 )
, size( size )
, data( ::operator new( size + 1 ) )
{
std::memcpy( data, text, size ):
data[size] = '\0';
}
~StringRep()
{
::operator delete( data );
}
};
StringRep* myRep;
public:
String( char const* initial_text )
: myRep( new StringRep( initial_text, strlen( initial_text ) ) )
{
}
String( String const& other )
: myRep( other.myRep )
{
++ myRep->useCount;
}
~String()
{
-- myRep->useCount;
if ( myRep->useCount == 0 ) {
delete myRep;
}
}
char& operator[]( size_t index )
{
return myRep->data[index];
}
};
Now imagine what happens if I write:
String a( "some text" );
String b( a );
a[4] = '-';
What is the value of b
after this? (Run through the code by
hand, if you're not sure.)
Obviously, this doesn't work. The solution is to add a flag,
bool uncopyable;
to StringRep
, which is initialized to
false
, and to modify the following functions:
String::String( String const& other )
{
if ( other.myRep->uncopyable ) {
myRep = new StringRep( other.myRep->data, other.myRep->size );
} else {
myRep = other.myRep;
++ myRep->useCount;
}
}
char& String::operator[]( size_t index )
{
if ( myRep->useCount > 1 ) {
-- myRep->useCount;
myRep = new StringRep( myRep->data, myRep->size );
}
myRep->uncopyable = true;
return myRep->data[index];
}
This means, of course, that []
will invalidate iterators and
references, but only the first time it is called on an object.
The next time, the useCount
will be one (and the image will be
uncopyable). So a[i] == a[j]
works; regardless of which the
compiler actually evaluates first (a[i]
or a[j]
), the second
one will find a useCount
of 1, and will not have to duplicate.
And because of the uncopyable
flag,
String a( "some text" );
char& c = a[4];
String b( a );
c = '-';
will also work, and not modify b
.
Of course, the above is enormously simplified. Getting it to work in a multithreaded environment is extremely difficult, unless you simply grab a mutex for the entire function for any function which might modify anything (in which case, the resulting class is extremely slow). G++ tried, and failed—there is on particular use case where it breaks. (Getting it to handle the other issues I've ignored is not particularly difficult, but does represent a lot of lines of code.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With