Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Microsoft's implementation of std::string require 40 bytes on the stack?

Tags:

c++

string

Having recently watched this video about facebook's implementation of string, I was curious to see the internals of Microsoft's implementation. Unfortunately, the string file (in %VisualStudioDirectory%/VC/include) doesn't seem to contain the actual definition, but rather just conversion functions (e.g. atoi) and some operator overloads.

I decided to do some poking and prodding at it from user-level programs. The first thing I did, of course, was to test sizeof(std::string). To my surprise, std::string takes 40 bytes! (On 64-bit machines anyways.) The previously mentioned video goes into detail about how facebook's implementation only requires 24 bytes and gcc's takes 32 bytes, so this was shocking to say the least.

We can dig a little deeper by writing a simple program that prints off the contents of the data byte-by-byte (including the c_str address), as such:

#include <iostream>
#include <string>
int main()
{
    std::string test = "this is a very, very, very long string";

    // Print contents of std::string test.
    char* data = reinterpret_cast<char*>(&test);
    for (size_t wordNum = 0; wordNum < sizeof(std::string); wordNum = wordNum + sizeof(uint64_t))
    {
        for (size_t i = 0; i < sizeof(uint64_t); i++)
            std::cout << (int)(data[wordNum + i]) << " ";

        std::cout << std::endl;
    }

    // Print the value of the address returned by test.c_str().
    // (Doing this byte-by-byte to match the above values).
    const char* testAddr = test.c_str();
    char* dataAddr = reinterpret_cast<char*>(&testAddr);

    std::cout << "c_str address: ";
    for (size_t i = 0; i < sizeof(const char*); i++)
        std::cout << (int)(dataAddr[i]) << " ";

    std::cout << std::endl;
}

This prints out:

48 33 -99 -47 -55 1 0 0
16 78 -100 -47 -55 1 0 0
-52 -52 -52 -52 -52 -52 -52 -52
38 0 0 0 0 0 0 0
47 0 0 0 0 0 0 0
c_str address: 16 78 -100 -47 -55 1 0 0

Examining this, we can see that the second word contains the address that points to the allocated data for the string, the third word is garbage (a buffer for Short String Optimization), the fourth word is the size, and the fifth word is the capacity. But what about the first word? It appears to be an address, but what for? Shouldn't everything already be accounted for?

For the sake of completeness, the following output shows SSO (the string is set to "Short String"). Note that the first word still seems to represent a pointer:

0 36 -28 19 123 1 0 0
83 104 111 114 116 32 83 116
114 105 110 103 0 -52 -52 -52
12 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0
c_str address: 112 -9 79 -108 23 0 0 0

EDIT: Ok, so having done more testing, it appears that the size of std::string actually decreases down to 32 bytes when compiled for release, and the first word is no longer there. But I'm still really interested in knowing why that is the case, and what that extra pointer is used for in debug mode.

Update: As per the tip by the user Yuushi, the extra word appears to related to Debug Iterator Support. This was verified when I turned off Debug Iterator Support (an example for doing this is shown here) and the size of std::string was reduced to 32 bytes, with the first word now missing.

However, it would still be really interesting to see how Debug Iterator Support uses that additional pointer to check for incorrect iterator use.

like image 395
user7107685 Avatar asked Oct 18 '22 23:10

user7107685


1 Answers

Visual Studio 2015 use xstring instead of string to define std::basic_string

NOTE: This answer is applied for VS2015 only, VS2013 uses a different implementation, however, they are more or less the same.

It's implemented as:

template<class _Elem,
class _Traits,
class _Alloc>
class basic_string
    : public _String_alloc<_String_base_types<_Elem, _Alloc> >
{
// This class has no member data
}

_String_alloc use a _Compressed_pair<_Alty, _String_val<_Val_types> > to store its data, in std::string, _Alty is std::allocator<char> and _Val_types is _Simple_types<char>, because std::is_empty<std::allocator<char>>::value is true, sizeof _Compressed_pair<_Alty, _String_val<_Val_types> > is the same with sizeof _String_val<_Val_types>

class _String_val inherites from _Container_base which is a typedef of _Container_base0 when #if _ITERATOR_DEBUG_LEVEL == 0 and _Container_base12 otherwise. The difference between them is _Container_base12 contains pointer to _Container_proxy for debug purpose. Beside that _String_val also have those members:

union _Bxty
    {   // storage for small buffer or pointer to larger one
    _Bxty()
        {   // user-provided, for fancy pointers
        }

    ~_Bxty() _NOEXCEPT
        {   // user-provided, for fancy pointers
        }

    value_type _Buf[_BUF_SIZE];
    pointer _Ptr;
    char _Alias[_BUF_SIZE]; // to permit aliasing
    } _Bx;

size_type _Mysize;  // current length of string
size_type _Myres;   // current storage reserved for string

With _BUF_SIZE is 16.

And pointer_type, size_type is well aligned together in this system. No alignment is necessary.

Hence, when _ITERATOR_DEBUG_LEVEL == 0 then sizeof std::string is:

_BUF_SIZE + 2 * sizeof size_type

otherwise it's

sizeof pointer_type +  _BUF_SIZE + 2 * sizeof size_type
like image 93
Danh Avatar answered Oct 21 '22 06:10

Danh