Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::string::c_str & Null termination

Tags:

c++

string

c-str

I've read various descriptions of std::string::c_str including questions raised on SO over the years/decades,

I like this description for its clarity:

Returns a pointer to an array that contains a null-terminated sequence of characters (i.e., a C-string) representing the current value of the string object. This array includes the same sequence of characters that make up the value of the string object plus an additional terminating null-character ('\0') at the end.

However some things about the purpose of this function are still unclear.

You could be forgiven for thinking that calling c_str might add a \0 character to the end of the string which is stored in the internal char array of the host object (std::string):

s[s.size+1] = '\0'

But it seems std::string objects are Null terminated by default even before calling c_str: enter image description here

After looking through the definition:

const _Elem *c_str() const _NOEXCEPT
{   // return pointer to null-terminated nonmutable array
    return (this->_Myptr());
}

I don't see any code which would add \0 to the end of a char array. As far as I can tell c_str just returns a pointer to the char stored in the first element of the array pretty much like begin() does. I don't even see code which checks that the internal array is terminated by \0

Or am I missing something?

like image 422
tuk Avatar asked Jan 05 '17 13:01

tuk


Video Answer


2 Answers

Before C++11, there was no requirement that a std::string (or the templated class std::basic_string - of which std::string is an instantiation) store a trailing '\0'. This was reflected in different specifications of the data() and c_str() member functions - data() returns a pointer to the underlying data (which was not required to be terminated with a '\0' and c_str() returned a copy with a terminating '\0'. However, equally, there was no requirement to NOT store a trailing '\0' internally (accessing characters past the end of the stored data was undefined behaviour) ..... and, for simplicity, some implementations chose to append a trailing '\0' anyway.

With C++11, this changed. Essentially, the data() member function was specified as giving the same effect as c_str() (i.e. the returned pointer is to the first character of an array that has a trailing '\0'). That has a consequence of requiring the trailing '\0' on the array returned by data(), and therefore on the internal representation.

So the behaviour you're seeing is consistent with C++11 - one of the invariants of the class is a trailing '\0' (i.e. constructors ensure that is the case, member functions which modify the string ensure it remains true, and all public member functions can rely on it being true).

The behaviour you're seeing is not inconsistent with C++ standards before C++11. Strictly speaking, std::string before C++11 was not required to maintain a trailing '\0' but, equally, an implementer could choose to do so.

like image 155
Peter Avatar answered Sep 30 '22 06:09

Peter


You do not see code that adds '\0' to the end of the sequence because null character is already there. An implementation of c_str cannot return a pointer to new array, so the array must be stored on the std::string object itself.

Hence, you have two valid approaches for implementing this:

  1. Always store '\0' at the end of _Myptr() array of characters on construction, or
  2. Make a copy of the string on demand, add '\0' when c_str() is called, and delete the copy in the destructor.

The first approach lets you return _Myptr() for c_str(), at the expense of storing an extra character for each string. The second approach requires an extra pointer per std::string object, so the first approach is less expensive.

like image 23
Sergey Kalinichenko Avatar answered Sep 30 '22 06:09

Sergey Kalinichenko