My colleague's code looked like this:
void copy(std::string const& s, char *d) {
for(int i = 0; i <= s.size(); i++, d++)
*d = s[i];
}
His application crashes and I think that it is because this accesses s
out of range, since the condition should go only up to s.size() - 1
.
But other guys next to me says there was a discussion in the past about this being legal. Can anyone please clear this up for me?
Let's put aside the possiblity that *d
is invalid since that has nothing to do with what the question seems directed at: whether or not std::string operator[]()
has well defined behavior when accessing the "element" at index std::string::size()
.
The C++03 standard has the following description of string::operator[]()
(21.3.4 "basic_string
element access"):
const_reference operator[](size_type pos) const; reference operator[](size_type pos);
Returns: If
pos < size()
, returnsdata()[pos]
. Otherwise, ifpos == size()
, the const version returnscharT()
. Otherwise, the behavior is undefined.
Since s
in the example code is const
, the behavior is well defined and s[s.size()]
will return a null character. However, if s
was not a const string
, the behavior would be undefined.
C++11 remedies this odd-ball behavior of the const
version behaving so differently than the non-const version in this edge case. C++11 21.4.5 "basic_string
element access" says:
const_reference operator[](size_type pos) const; reference operator[](size_type pos);
Requires:
pos <= size()
.Returns:
*(begin() + pos
) ifpos < size()
, otherwise a reference to an object of type T with valuecharT()
; the referenced value shall not be modified.
So for a C++11 compiler, the behavior is well-defined whether or not the string
is const
.
Unrelated to the question, I find it a little strange that C++11 says that "the referenced value shall not be modified" - it's not clear to me if that clause applies only in the case where pos == size()
. I'm pretty sure there's a ton of existing code that does things like s[i] = some_character;
where s
is a non-const std:string
and i < s.size()
. Is that undefined behavior now? I suspect that that clause applies only to the special-case charT()
object.
Another interesting thing is that neither standard seems to require that the address of the object returned for s[s.size()]
be in any way related to the address of the object returned for s[s.size() - 1]
. In other words, it seems like the returned charT()
reference doesn't have to be contiguous to the end of the string data. I suspect that this is to give implementers a choice to just return a reference to a single static copy of that sentinel element if desired (that would also explain C++11's "shall not be modified" restriction, assuming it applies only to the special case).
cppreference says this:
reference operator[]( size_type pos ); const_reference operator[]( size_type pos ) const;
If
pos==size()
,
- The const version returns a reference to the character with value CharT() (the null character). (until C++11)
- Both versions returns a reference to the character with value CharT() (the null character). Modifying the null character through non-const reference results in undefined behavior. (since C++11)
So it is OK so long as you don't modify the null character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With