Is it valid to create an iterator to end(str)+1
for std::string
?
And if it isn't, why isn't it?
This question is restricted to C++11 and later, because while pre-C++11 the data was already stored in a continuous block in any but rare POC toy-implementations, the data didn't have to be stored that way.
And I think that might make all the difference.
The significant difference between std::string
and any other standard container I speculate on is that it always contains one element more than its size
, the zero-terminator, to fulfill the requirements of .c_str()
.
21.4.7.1 basic_string accessors [string.accessors]
const charT* c_str() const noexcept; const charT* data() const noexcept;
1 Returns: A pointer
p
such thatp + i == &operator[](i)
for eachi
in[0,size()]
.
2 Complexity: Constant time.
3 Requires: The program shall not alter any of the values stored in the character array.
Still, even though it should imho guarantee that said expression is valid, for consistency and interoperability with zero-terminated strings if nothing else, the only paragraph I found casts doubt on that:
21.4.1 basic_string general requirements [string.require]
4 The char-like objects in a
basic_string
object shall be stored contiguously. That is, for anybasic_string
objects
, the identity&*(s.begin() + n) == &*s.begin() + n
shall hold for all values ofn
such that0 <= n < s.size()
.
(All quotes are from C++14 final draft (n3936).)
Related: Legal to overwrite std::string's null terminator?
std::string::end Returns an iterator pointing to the past-the-end character of the string. The past-the-end character is a theoretical character that would follow the last character in the string.
C++ Iterator – based Approach: The string can be traversed using iterator.
The std::string class manages the underlying storage for you, storing your strings in a contiguous manner. You can get access to this underlying buffer using the c_str() member function, which will return a pointer to null-terminated char array.
In something like an std::vector the ::end() iterator will point to one past the last element. You can't dereference this iterator but you can compare it to another iterator. If you compare another iterator to end() you know you've reached the end of the container.
TL;DR: s.end() + 1
is undefined behavior.
std::string
is a strange beast, mainly for historical reasons:
\0
character exists beyond the length reported by strlen
.This led std::string
, in C++03, to number 103 member functions, and since then a few were added.
Therefore, discrepancies between the different methods should be expected.
Already in the index-based interface discrepancies appear:
§21.4.5 [string.access]
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
1/ Requires:
pos <= size()
const_reference at(size_type pos) const;
reference at(size_type pos);
5/ Throws:
out_of_range
ifpos >= size()
Yes, you read this right, s[s.size()]
returns a reference to a NUL character while s.at(s.size())
throws an out_of_range
exception. If anyone tells you to replace all uses of operator[]
by at
because they are safer, beware the string
trap...
So, what about iterators?
§21.4.3 [string.iterators]
iterator end() noexcept;
const_iterator end() const noexcept;
const_iterator cend() const noexcept;
2/ Returns: An iterator which is the past-the-end value.
Wonderfully bland.
So we have to refer to other paragraphs. A pointer is offered by
§21.4 [basic.string]
3/ The iterators supported by
basic_string
are random access iterators (24.2.7).
while §17.6 [requirements] seems devoid of anything related. Thus, strings iterators are just plain old iterators (you can probably sense where this is going... but since we came this far let's go all the way).
This leads us to:
24.2.1 [iterator.requirements.general]
5/ Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator
i
for which the expression*i
is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. [...]
So, *s.end()
is ill-formed.
24.2.3 [input.iterators]
2/ Table 107 -- Input iterator requirements (in addition to Iterator)
List for pre-condition to ++r
and r++
that r
be dereferencable.
Neither the Forward iterators, Bidirectional iterators nor Random iterator lift this restriction (and all indicate they inherit the restrictions of their predecessor).
Also, for completeness, in 24.2.7 [random.access.iterators], Table 111 -- Random access iterator requirements (in addition to bidirectional iterator) lists the following operational semantics:
r += n
is equivalent to [inc|dec]rememting r
n
timesa + n
and n + a
are equivalent to copying a
and then applying += n
to the copyand similarly for -= n
and - n
.
Thus s.end() + 1
is undefined behavior.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With