Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are end+1 iterators for std::string allowed?

Tags:

Is it valid to create an iterator to end(str)+1 for std::string?
And if it isn't, why isn't it?

This question is restricted to C++11 and later, because while pre-C++11 the data was already stored in a continuous block in any but rare POC toy-implementations, the data didn't have to be stored that way.
And I think that might make all the difference.

The significant difference between std::string and any other standard container I speculate on is that it always contains one element more than its size, the zero-terminator, to fulfill the requirements of .c_str().

21.4.7.1 basic_string accessors [string.accessors]

const charT* c_str() const noexcept; const charT* data() const noexcept; 

1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
2 Complexity: Constant time.
3 Requires: The program shall not alter any of the values stored in the character array.

Still, even though it should imho guarantee that said expression is valid, for consistency and interoperability with zero-terminated strings if nothing else, the only paragraph I found casts doubt on that:

21.4.1 basic_string general requirements [string.require]

4 The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

(All quotes are from C++14 final draft (n3936).)

Related: Legal to overwrite std::string's null terminator?

like image 825
Deduplicator Avatar asked Nov 11 '15 18:11

Deduplicator


People also ask

What does std::string end () return?

std::string::end Returns an iterator pointing to the past-the-end character of the string. The past-the-end character is a theoretical character that would follow the last character in the string.

Do strings have iterators C++?

C++ Iterator – based Approach: The string can be traversed using iterator.

Is std::string contiguous?

The std::string class manages the underlying storage for you, storing your strings in a contiguous manner. You can get access to this underlying buffer using the c_str() member function, which will return a pointer to null-terminated char array.

What does the end iterator point to?

In something like an std::vector the ::end() iterator will point to one past the last element. You can't dereference this iterator but you can compare it to another iterator. If you compare another iterator to end() you know you've reached the end of the container.


1 Answers

TL;DR: s.end() + 1 is undefined behavior.


std::string is a strange beast, mainly for historical reasons:

  1. It attempts to bring C compatibility, where it is known that an additional \0 character exists beyond the length reported by strlen.
  2. It was designed with an index-based interface.
  3. As an after thought, when merged in the Standard library with the rest of the STL code, an iterator-based interface was added.

This led std::string, in C++03, to number 103 member functions, and since then a few were added.

Therefore, discrepancies between the different methods should be expected.


Already in the index-based interface discrepancies appear:

§21.4.5 [string.access]

const_reference operator[](size_type pos) const;
reference operator[](size_type pos);

1/ Requires: pos <= size()

const_reference at(size_type pos) const; reference at(size_type pos);

5/ Throws: out_of_range if pos >= size()

Yes, you read this right, s[s.size()] returns a reference to a NUL character while s.at(s.size()) throws an out_of_range exception. If anyone tells you to replace all uses of operator[] by at because they are safer, beware the string trap...


So, what about iterators?

§21.4.3 [string.iterators]

iterator end() noexcept;
const_iterator end() const noexcept;
const_iterator cend() const noexcept;

2/ Returns: An iterator which is the past-the-end value.

Wonderfully bland.

So we have to refer to other paragraphs. A pointer is offered by

§21.4 [basic.string]

3/ The iterators supported by basic_string are random access iterators (24.2.7).

while §17.6 [requirements] seems devoid of anything related. Thus, strings iterators are just plain old iterators (you can probably sense where this is going... but since we came this far let's go all the way).

This leads us to:

24.2.1 [iterator.requirements.general]

5/ Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. [...]

So, *s.end() is ill-formed.

24.2.3 [input.iterators]

2/ Table 107 -- Input iterator requirements (in addition to Iterator)

List for pre-condition to ++r and r++ that r be dereferencable.

Neither the Forward iterators, Bidirectional iterators nor Random iterator lift this restriction (and all indicate they inherit the restrictions of their predecessor).

Also, for completeness, in 24.2.7 [random.access.iterators], Table 111 -- Random access iterator requirements (in addition to bidirectional iterator) lists the following operational semantics:

  • r += n is equivalent to [inc|dec]rememting r n times
  • a + n and n + a are equivalent to copying a and then applying += n to the copy

and similarly for -= n and - n.

Thus s.end() + 1 is undefined behavior.

like image 77
Matthieu M. Avatar answered Sep 29 '22 00:09

Matthieu M.