Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterator invalidation by `std::string::begin()`/`std::string::end()`?

#include <string>
#include <iostream>

int main() {
    std::string s = "abcdef";

    std::string s2 = s;

    auto begin = const_cast<std::string const &>(s2).begin();
    auto end = s2.end();

    std::cout << end - begin << '\n';
}

This code mixes the result of begin() const with the result of end(). Neither of these functions is permitted to invalidate any iterators. However I'm curious whether the requirement of end() to not invalidate the iterator variable begin actually means that the variable begin is usable with end.

Consider a C++98, copy-on-write implementation of std::string; the non-const begin() and end() functions cause a the internal buffer to be copied because the result of these functions can be used to modify the string. So begin above starts out valid for both s and s2, but the use of the non-const end() member causes it to no longer be valid for s2, the container that produced it.

The above code produces 'unexpected' results with a copy-on-write implementation, such as libstdc++. Instead of end - begin being the same as s2.size(), libstdc++ produces another number.

  • Does causing begin to no longer be valid iterator into s2, the container it was retrieved from, constitute 'invalidating' the iterator? If you look at the requirements on iterators, they all appear to hold for this iterator after .end() is called, so perhaps begin still qualifies as a valid iterator, and thus has not been invalidated?

  • Is the above code well defined in C++98? In C++11, which prohibits copy-on-write implementations?

From my own brief reading of the specs, it appears under-specified, so that there may not be any guarantee that the results of begin() and end() can ever be used together, even without mixing const and non-const versions.

like image 414
bames53 Avatar asked Feb 26 '15 17:02

bames53


2 Answers

As you say, C++11 differs from earlier versions in this regard. There's no problem in C++11 because all attempts to allow copy on write were removed. In pre-C++11, your code results in undefined behavior; the call s2.end() is allowed to invalidate existing iterators (and did, and maybe still does, in g++).

Note that even if s2 were not a copy, the standard would allow it to invalidate iterators. In fact, the CD for C++98 even made things like f( s.begin(), s.end() ) or s[i] == s[j] undefined behavior. This was only realized at the last minute, and corrected so that only the first call to begin(), end() or [] could invalidate the iterators.

like image 187
James Kanze Avatar answered Sep 26 '22 05:09

James Kanze


The code is OK: a CoW implementation is pretty much required to unshare when there is a danger to an iterator or reference to an element is held. That is, when you there is something which accessed an element in one string and a copy of it ventures to do the same, i.e., use an iterator or the subscript operator, it will have to be unshared. It could know about its iterators and update them as needed.

Of course, in a concurrent system it is near impossible to do all this without data races but pre-C++11 there are no data races.

like image 35
Dietmar Kühl Avatar answered Sep 22 '22 05:09

Dietmar Kühl