What actually is done when string::c_str()
is invoked?
string::c_str()
will allocate memory, copy the internal data of the string object and append a null-terminated character to the newly allocated memory?or
string::c_str()
must be O(1), so allocating memory and copying the string
over is no longer allowed. In practice having the null-terminator there all the time is the only sane implementation.Somebody in the comments of this answer of this question says that C++11 requires that std::string
allocate an extra char
for a trailing '\0'
. So it seems the second option is possible.
And another person says that std::string
operations - e.g. iteration, concatenation and element mutation - don't need the zero terminator. Unless you pass the string
to a function expecting a zero terminated string, it can be omitted.
And more voice from an expert:
Why is it common for implementers to make .data() and .c_str() do the same thing?
Because it is more efficient to do so. The only way to make .data() return something that is not null terminated, would be to have .c_str() or .data() copy their internal buffer, or to just use 2 buffers. Having a single null terminated buffer always means that you can always use just one internal buffer when implementing std::string.
So I am really confused now, what actually is done when string::c_str()
is invoked?
Update:
If c_str()
is implemented as simply returning the pointer it's already allocated and managed.
A. Since c_str()
must be null-terminated, the internal buffer needs to be always be null-terminated, even if for an empty std::string, e.g: std::string demo_str
;, there should be a \0
in the internal memory of demo_str
. Am I right?
B.What would happen when std::string::substr()
is invoked? Automactically append a \0
to sub-string?
The c_str() method converts a string to an array of characters with a null character at the end. The function takes in no parameters and returns a pointer to this character array (also called a c-string).
The c_str method of std::string returns a raw pointer to the memory buffer owned by the std::string .
std::string::c_str Returns a pointer to an array that contains a null-terminated sequence of characters (i.e., a C-string) representing the current value of the string object.
See this. The c_str() member function returns a const char * pointer to the string.
Since C++11, std::string::c_str()
and std::string::data()
are both required to return a pointer to the string's internal buffer. And since c_str()
(but not data()
) must be null-terminated, that effectively requires the internal buffer to always be null-terminated, though the null terminator is not counted by size()
/length()
, or returned by std::string
iterators, etc.
Prior to C++11, the behavior of c_str()
was technically implementation-specific, but most implementations I've ever seen worked this way, as it is the simplest and sanest way to implement it. C++11 just standardized the behavior that was already in wide use.
UPDATE
Since C++11, the buffer is always null-terminated, even for an empty string. However, that does not mean the buffer is required to be dynamically allocated when the string is empty. It could point to an SSO buffer, or even to a single static
nul character. There is no guarantee that the pointer returned by c_str()
/data()
remains pointing at the same memory address as the content of the string changes.
std::string::substr()
returns a new std::string
with its own null-terminated buffer. The string being copied from is unaffected.
Here is an empirical "proof" that the complexity of .c_str()
is o(1):
#include <stdio.h>
#include <string>
using namespace std;
int main(int argc, char **argv)
{
std::string x(5000000, 'b'); // <--- single time allocation
// std::string x(5, 'b'); // <--- compare to a much shorter string
for (unsigned int i=0;i<1000000;i++)
{
const char *y = x.c_str(); // <--- copy entire content ?
}
}
-O0
to avoid optimizing out anything.c_str()
is called.There's a lot of great answers and comments already provided. But to demonstrate that std::string
is typically backed by a null terminated string, I've provided a simple, yet naive implementation. It's not complete, doesn't do error checking, and is certainly not optimized. But it's complete enough to show you how a string class is typically implemented with a null terminated buffer as a member variable.
class string
{
public:
string()
{
assign("", 0);
}
string(const char* s)
{
assign(s, strlen(s));
}
string(const char* s, size_t len)
{
assign(s, len);
}
string(const string& s)
{
assign(s._ptr, s._len);
}
~string()
{
delete [] _ptr;
}
string& operator=(const string& s)
{
const char* oldptr = _ptr;
assign(s._ptr, s._len);
delete [] oldptr;
}
const char* data()
{
return _ptr;
}
const char* c_str()
{
return _ptr;
}
size_t length()
{
return _len;
}
// substr always returns a new string
std::string substr(size_t pos, size_t count)
{
std::string s(_ptr+pos, count);
return s;
}
private:
char* _ptr;
size_t _len;
void assign(const char* ptr, size_t len)
{
_len = len;
_ptr = new char[_len+1]; // +1 for null termination
memcpy(_ptr, ptr, len);
_ptr[_len] = '\0'; // always null terminate
}
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With