Why is std::string
's size, as determined by sizeof(std::string)
, yield 8
?
I thought it should be more than 8
as it has to have an int
(sizeof(int) == 8
on my machine) data member for giving std::string::length()
and std::string::size()
in O(1) and probably a char*
for characters.
Example. In below example for std::string::size. The size of str is 22 bytes.
So this string implementation is 32 because that's the way it was built in this implementation and it will by 16 in other implementations and 64 in yet another. The size of the string will (like water) depend on the environment it is used in.
std::string actually maintains the size as one of its data member.
In C++, string length really represents the number of bytes used to encode the given string. Since one byte in C++ usually maps to one character, this metric mostly means “number of characters,” too.
The implementation of std::string
is not specified by the C++ standard. It only describes the classes behaviour. However, I would expect there to be more than one pointer's worth of information in the class. In particular:
It MAY of course store all these in a dynamically allocated location, and thus take up exactly the same amount of space as char*
[in most architectures].
In fact looking at the C++ header that comes with my Linux machine, the implementation is quite clear when you look at (which, as per comments, is "pre-C++11", but I think roughly representative either way):
size_type
length() const _GLIBCXX_NOEXCEPT
{ return _M_rep()->_M_length; }
and then follow that to:
_Rep*
_M_rep() const _GLIBCXX_NOEXCEPT
{ return &((reinterpret_cast<_Rep*> (_M_data()))[-1]); }
which in turn leads to:
_CharT*
_M_data() const _GLIBCXX_NOEXCEPT
{ return _M_dataplus._M_p; }
Which leads to
// Data Members (private):
mutable _Alloc_hider _M_dataplus;
and then we get to:
struct _Alloc_hider : _Alloc
{
_Alloc_hider(_CharT* __dat, const _Alloc& __a) _GLIBCXX_NOEXCEPT
: _Alloc(__a), _M_p(__dat) { }
_CharT* _M_p; // The actual data.
};
The actual data about the string is:
struct _Rep_base
{
size_type _M_length;
size_type _M_capacity;
_Atomic_word _M_refcount;
};
So, it's all a simple pointer called _M_p
hidden inside several layers of getters and a bit of casting...
Because all your implementation of std::string
stores is a pointer to the heap where all of it's data is stored.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With