How std::string is internally represented in c++11 (libstdc++)?
While digging inside the implementation, I found:
/* A string looks like this:
*
* [_Rep]
* _M_length
* [basic_string<char_type>] _M_capacity
* _M_dataplus _M_refcount
* _M_p ----------------> unnamed array of char_type
*
* Where the _M_p points to the first character in the string, and
* you cast it to a pointer-to-_Rep and subtract 1 to get a
* pointer to the header.
*
* This approach has the enormous advantage that a string object
* requires only one allocation. All the ugliness is confined
* within a single %pair of inline functions, which each compile to
* a single @a add instruction: _Rep::_M_data(), and
* string::_M_rep(); and the allocation function which gets a
* block of raw bytes and with room enough and constructs a _Rep
* object at the front.
*
* The reason you want _M_data pointing to the character %array and
* not the _Rep is so that the debugger can see the string
* contents. (Probably we should add a non-inline member to get
* the _Rep for the debugger to use, so users can check the actual
* string length.)
*
* Note that the _Rep object is a POD so that you can have a
* static <em>empty string</em> _Rep object already @a constructed before
* static constructors have run. The reference-count encoding is
* chosen so that a 0 indicates one reference, so you never try to
* destroy the empty-string _Rep object.
*/
// _Rep: string representation
// Invariants:
// 1. String really contains _M_length + 1 characters: due to 21.3.4
// must be kept null-terminated.
// 2. _M_capacity >= _M_length
// Allocated memory is always (_M_capacity + 1) * sizeof(_CharT).
// 3. _M_refcount has three states:
// -1: leaked, one reference, no ref-copies allowed, non-const.
// 0: one reference, non-const.
// n>0: n + 1 references, operations require a lock, const.
// 4. All fields==0 is an empty string, given the extra storage
// beyond-the-end for a null terminator; thus, the shared
// empty string representation needs no constructor.
struct _Rep_base
{
size_type _M_length;
size_type _M_capacity;
_Atomic_word _M_refcount;
};
I don't understand those comments very much:
std::string class in C++ C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.
The std::string class manages the underlying storage for you, storing your strings in a contiguous manner. You can get access to this underlying buffer using the c_str() member function, which will return a pointer to null-terminated char array. This allows std::string to interoperate with C-string APIs.
In C++ you should in almost all cases use std::string instead of a raw char array. std::string manages the underlying memory for you, which is by itself a good enough reason to prefer it.
std::string::dataReturns a pointer to an array that contains the same sequence of characters as the characters that make up the value of the string object.
GCC did move away from the refcounted string to follow the c++11 standard, but note that it is possible that your program will use it as part of the ABI compatibility implementation.
std::string
doesn't have a _Rep_Base
member but a pointer to _Rep
with _Rep
inheriting from _Rep_Base
It is what is explained here :
* Where the _M_p points to the first character in the string, and
* you cast it to a pointer-to-_Rep and subtract 1 to get a
* pointer to the header.
Yes, but after the header of the _Rep object, and your string only has a pointer to it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With