Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What actually is done when `string::c_str()` is invoked?

Tags:

c++

string

stl

What actually is done when string::c_str() is invoked?

  1. string::c_str() will allocate memory, copy the internal data of the string object and append a null-terminated character to the newly allocated memory?

or

  1. Since string::c_str() must be O(1), so allocating memory and copying the string over is no longer allowed. In practice having the null-terminator there all the time is the only sane implementation.

Somebody in the comments of this answer of this question says that C++11 requires that std::string allocate an extra char for a trailing '\0'. So it seems the second option is possible.

And another person says that std::string operations - e.g. iteration, concatenation and element mutation - don't need the zero terminator. Unless you pass the string to a function expecting a zero terminated string, it can be omitted.

And more voice from an expert:

Why is it common for implementers to make .data() and .c_str() do the same thing?

Because it is more efficient to do so. The only way to make .data() return something that is not null terminated, would be to have .c_str() or .data() copy their internal buffer, or to just use 2 buffers. Having a single null terminated buffer always means that you can always use just one internal buffer when implementing std::string.

So I am really confused now, what actually is done when string::c_str() is invoked?

Update:

If c_str() is implemented as simply returning the pointer it's already allocated and managed.

A. Since c_str() must be null-terminated, the internal buffer needs to be always be null-terminated, even if for an empty std::string, e.g: std::string demo_str;, there should be a \0 in the internal memory of demo_str. Am I right?

B.What would happen when std::string::substr() is invoked? Automactically append a \0 to sub-string?

like image 618
John Avatar asked Sep 25 '21 04:09

John


People also ask

How does c_str () work?

The c_str() method converts a string to an array of characters with a null character at the end. The function takes in no parameters and returns a pointer to this character array (also called a c-string).

What does string c_str return?

The c_str method of std::string returns a raw pointer to the memory buffer owned by the std::string .

What is the purpose of the c_str () member function of std :: string?

std::string::c_str Returns a pointer to an array that contains a null-terminated sequence of characters (i.e., a C-string) representing the current value of the string object.

What does filename c_str () mean?

See this. The c_str() member function returns a const char * pointer to the string.


3 Answers

Since C++11, std::string::c_str() and std::string::data() are both required to return a pointer to the string's internal buffer. And since c_str() (but not data()) must be null-terminated, that effectively requires the internal buffer to always be null-terminated, though the null terminator is not counted by size()/length(), or returned by std::string iterators, etc.

Prior to C++11, the behavior of c_str() was technically implementation-specific, but most implementations I've ever seen worked this way, as it is the simplest and sanest way to implement it. C++11 just standardized the behavior that was already in wide use.

UPDATE

Since C++11, the buffer is always null-terminated, even for an empty string. However, that does not mean the buffer is required to be dynamically allocated when the string is empty. It could point to an SSO buffer, or even to a single static nul character. There is no guarantee that the pointer returned by c_str()/data() remains pointing at the same memory address as the content of the string changes.

std::string::substr() returns a new std::string with its own null-terminated buffer. The string being copied from is unaffected.

like image 62
Remy Lebeau Avatar answered Oct 10 '22 12:10

Remy Lebeau


Here is an empirical "proof" that the complexity of .c_str() is o(1):

#include <stdio.h>
#include <string>
using namespace std;
int main(int argc, char **argv)
{
    std::string x(5000000, 'b'); // <--- single time allocation
    // std::string x(5, 'b'); // <--- compare to a much shorter string
    for (unsigned int i=0;i<1000000;i++)
    {
        const char *y = x.c_str(); // <--- copy entire content ?
    }
}
  • compiled with -O0 to avoid optimizing out anything
  • timing 2 versions: I get identical performance
  • this is an empirical "proof" that (at least my machine's implementation)
    • extracts the internal representation of a null terminated string
    • doesn't copy content every time .c_str() is called.
like image 1
OrenIshShalom Avatar answered Oct 10 '22 12:10

OrenIshShalom


There's a lot of great answers and comments already provided. But to demonstrate that std::string is typically backed by a null terminated string, I've provided a simple, yet naive implementation. It's not complete, doesn't do error checking, and is certainly not optimized. But it's complete enough to show you how a string class is typically implemented with a null terminated buffer as a member variable.

class string
{
public:

    string()
    {
        assign("", 0);
    }

    string(const char* s)
    {
        assign(s, strlen(s));
    }

    string(const char* s, size_t len)
    {
        assign(s, len);
    }

    string(const string& s)
    {
        assign(s._ptr, s._len);
    }

    ~string()
    {
       delete [] _ptr;
    }

    string& operator=(const string& s)
    {
        const char* oldptr = _ptr;
        assign(s._ptr, s._len);
        delete [] oldptr;
    }

    const char* data()
    {
        return _ptr;
    }

    const char* c_str()
    {
       return _ptr;
    }

    size_t length()
    {
        return _len;
    }

    // substr always returns a new string
    std::string substr(size_t pos, size_t count)
    {
        std::string s(_ptr+pos, count);
        return s;  
    }

private:
    char* _ptr;
    size_t _len;

    void assign(const char* ptr, size_t len)
    {
        _len = len;        
        _ptr = new char[_len+1]; // +1 for null termination
        memcpy(_ptr, ptr, len); 
        _ptr[_len] = '\0';       // always null terminate
    }
};
like image 1
selbie Avatar answered Oct 10 '22 12:10

selbie