Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++11 internal std::string representation (libstdc++)

How std::string is internally represented in c++11 (libstdc++)?

While digging inside the implementation, I found:

/*  A string looks like this:
 *
 *                                        [_Rep]
 *                                        _M_length
 *   [basic_string<char_type>]            _M_capacity
 *   _M_dataplus                          _M_refcount
 *   _M_p ---------------->               unnamed array of char_type
 *
 *  Where the _M_p points to the first character in the string, and
 *  you cast it to a pointer-to-_Rep and subtract 1 to get a
 *  pointer to the header.
 *
 *  This approach has the enormous advantage that a string object
 *  requires only one allocation.  All the ugliness is confined
 *  within a single %pair of inline functions, which each compile to
 *  a single @a add instruction: _Rep::_M_data(), and
 *  string::_M_rep(); and the allocation function which gets a
 *  block of raw bytes and with room enough and constructs a _Rep
 *  object at the front.
 *
 *  The reason you want _M_data pointing to the character %array and
 *  not the _Rep is so that the debugger can see the string
 *  contents. (Probably we should add a non-inline member to get
 *  the _Rep for the debugger to use, so users can check the actual
 *  string length.)
 *
 *  Note that the _Rep object is a POD so that you can have a
 *  static <em>empty string</em> _Rep object already @a constructed before
 *  static constructors have run.  The reference-count encoding is
 *  chosen so that a 0 indicates one reference, so you never try to
 *  destroy the empty-string _Rep object.
 */
  // _Rep: string representation
  //   Invariants:
  //   1. String really contains _M_length + 1 characters: due to 21.3.4
  //      must be kept null-terminated.
  //   2. _M_capacity >= _M_length
  //      Allocated memory is always (_M_capacity + 1) * sizeof(_CharT).
  //   3. _M_refcount has three states:
  //      -1: leaked, one reference, no ref-copies allowed, non-const.
  //       0: one reference, non-const.
  //     n>0: n + 1 references, operations require a lock, const.
  //   4. All fields==0 is an empty string, given the extra storage
  //      beyond-the-end for a null terminator; thus, the shared
  //      empty string representation needs no constructor.
  struct _Rep_base
  {
    size_type       _M_length;
    size_type       _M_capacity;
    _Atomic_word    _M_refcount;
  };

I don't understand those comments very much:

  • is std::string ref counted? How? I mean _M_refcount is not a pointer, so if one string modifies it, the other can't see it.
  • buffer lies immediately after header? If that's the case I don't really understand why.
like image 701
Borzh Avatar asked Jul 18 '14 14:07

Borzh


People also ask

What does std::string () do?

std::string class in C++ C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.

Can you use std::string in C?

The std::string class manages the underlying storage for you, storing your strings in a contiguous manner. You can get access to this underlying buffer using the c_str() member function, which will return a pointer to null-terminated char array. This allows std::string to interoperate with C-string APIs.

Should I use char [] or std::string?

In C++ you should in almost all cases use std::string instead of a raw char array. std::string manages the underlying memory for you, which is by itself a good enough reason to prefer it.

What is std::string data?

std::string::dataReturns a pointer to an array that contains the same sequence of characters as the characters that make up the value of the string object.


1 Answers

GCC did move away from the refcounted string to follow the c++11 standard, but note that it is possible that your program will use it as part of the ABI compatibility implementation.

How it is refcounted

std::string doesn't have a _Rep_Base member but a pointer to _Rep with _Rep inheriting from _Rep_Base

It is what is explained here :

 *  Where the _M_p points to the first character in the string, and
 *  you cast it to a pointer-to-_Rep and subtract 1 to get a
 *  pointer to the header.

The buffer lies after the header...

Yes, but after the header of the _Rep object, and your string only has a pointer to it.

like image 186
Lectem Avatar answered Sep 22 '22 15:09

Lectem