I've read various descriptions of std::string::c_str
including questions raised on SO over the years/decades,
I like this description for its clarity:
Returns a pointer to an array that contains a null-terminated sequence of characters (i.e., a C-string) representing the current value of the string object. This array includes the same sequence of characters that make up the value of the string object plus an additional terminating null-character ('\0') at the end.
However some things about the purpose of this function are still unclear.
You could be forgiven for thinking that calling c_str
might add a \0
character to the end of the string which is stored in the internal char array of the host object (std::string
):
s[s.size+1] = '\0'
But it seems std::string
objects are Null terminated by default even before calling c_str
:
After looking through the definition:
const _Elem *c_str() const _NOEXCEPT
{ // return pointer to null-terminated nonmutable array
return (this->_Myptr());
}
I don't see any code which would add \0
to the end of a char array. As far as I can tell c_str
just returns a pointer to the char stored in the first element of the array pretty much like begin()
does. I don't even see code which checks that the internal array is terminated by \0
Or am I missing something?
Before C++11, there was no requirement that a std::string
(or the templated class std::basic_string
- of which std::string is an instantiation) store a trailing '\0'
. This was reflected in different specifications of the data()
and c_str()
member functions - data()
returns a pointer to the underlying data (which was not required to be terminated with a '\0'
and c_str()
returned a copy with a terminating '\0'
. However, equally, there was no requirement to NOT store a trailing '\0'
internally (accessing characters past the end of the stored data was undefined behaviour) ..... and, for simplicity, some implementations chose to append a trailing '\0'
anyway.
With C++11, this changed. Essentially, the data()
member function was specified as giving the same effect as c_str()
(i.e. the returned pointer is to the first character of an array that has a trailing '\0'
). That has a consequence of requiring the trailing '\0'
on the array returned by data()
, and therefore on the internal representation.
So the behaviour you're seeing is consistent with C++11 - one of the invariants of the class is a trailing '\0'
(i.e. constructors ensure that is the case, member functions which modify the string ensure it remains true, and all public member functions can rely on it being true).
The behaviour you're seeing is not inconsistent with C++ standards before C++11. Strictly speaking, std::string
before C++11 was not required to maintain a trailing '\0'
but, equally, an implementer could choose to do so.
You do not see code that adds '\0'
to the end of the sequence because null character is already there. An implementation of c_str
cannot return a pointer to new array, so the array must be stored on the std::string
object itself.
Hence, you have two valid approaches for implementing this:
'\0'
at the end of _Myptr()
array of characters on construction, or'\0'
when c_str()
is called, and delete the copy in the destructor.The first approach lets you return _Myptr()
for c_str()
, at the expense of storing an extra character for each string. The second approach requires an extra pointer per std::string
object, so the first approach is less expensive.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With