Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string[length()] in C++, is it OK?

Tags:

c++

string

c++11

My colleague's code looked like this:

void copy(std::string const& s, char *d) {
  for(int i = 0; i <= s.size(); i++, d++)
    *d = s[i];
}

His application crashes and I think that it is because this accesses s out of range, since the condition should go only up to s.size() - 1.

But other guys next to me says there was a discussion in the past about this being legal. Can anyone please clear this up for me?

like image 530
Johannes Schaub - litb Avatar asked May 13 '12 09:05

Johannes Schaub - litb


2 Answers

Let's put aside the possiblity that *d is invalid since that has nothing to do with what the question seems directed at: whether or not std::string operator[]() has well defined behavior when accessing the "element" at index std::string::size().

The C++03 standard has the following description of string::operator[]() (21.3.4 "basic_string element access"):

const_reference operator[](size_type pos) const;
reference operator[](size_type pos);

Returns: If pos < size(), returns data()[pos]. Otherwise, if pos == size(), the const version returns charT(). Otherwise, the behavior is undefined.

Since s in the example code is const, the behavior is well defined and s[s.size()] will return a null character. However, if s was not a const string, the behavior would be undefined.

C++11 remedies this odd-ball behavior of the const version behaving so differently than the non-const version in this edge case. C++11 21.4.5 "basic_string element access" says:

const_reference operator[](size_type pos) const;
reference operator[](size_type pos);

Requires: pos <= size().

Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.

So for a C++11 compiler, the behavior is well-defined whether or not the string is const.

Unrelated to the question, I find it a little strange that C++11 says that "the referenced value shall not be modified" - it's not clear to me if that clause applies only in the case where pos == size(). I'm pretty sure there's a ton of existing code that does things like s[i] = some_character; where s is a non-const std:string and i < s.size(). Is that undefined behavior now? I suspect that that clause applies only to the special-case charT() object.

Another interesting thing is that neither standard seems to require that the address of the object returned for s[s.size()] be in any way related to the address of the object returned for s[s.size() - 1]. In other words, it seems like the returned charT() reference doesn't have to be contiguous to the end of the string data. I suspect that this is to give implementers a choice to just return a reference to a single static copy of that sentinel element if desired (that would also explain C++11's "shall not be modified" restriction, assuming it applies only to the special case).

like image 165
Michael Burr Avatar answered Oct 11 '22 15:10

Michael Burr


cppreference says this:

reference       operator[]( size_type pos );

const_reference operator[]( size_type pos ) const;

If pos==size(),

  • The const version returns a reference to the character with value CharT() (the null character). (until C++11)
  • Both versions returns a reference to the character with value CharT() (the null character). Modifying the null character through non-const reference results in undefined behavior. (since C++11)

So it is OK so long as you don't modify the null character.

like image 23
Pubby Avatar answered Oct 11 '22 13:10

Pubby