Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::string implementation in GCC and its memory overhead for short strings

I am currently working on an application for a low-memory platform that requires an std::set of many short strings (>100,000 strings of 4-16 characters each). I recently transitioned this set from std::string to const char * to save memory and I was wondering whether I was really avoiding all that much overhead per string.

I tried using the following:

std::string sizeTest = "testString";
std::cout << sizeof(sizeTest) << " bytes";

But it just gave me an output of 4 bytes, indicating that the string contains a pointer. I'm well aware that strings store their data in a char * internally, but I thought the string class would have additional overhead.

Does the GCC implementation of std::string incur more overhead than sizeof(std::string) would indicate? More importantly, is it significant over this size of data set?

Here are the sizes of relevant types on my platform (it is 32-bit and has 8 bits per byte):

char: 1 bytes
void *: 4 bytes
char *: 4 bytes
std::string: 4 bytes

like image 679
Wheatevo Avatar asked Feb 20 '11 17:02

Wheatevo


People also ask

How are C++ strings stored in memory?

In C programming, the collection of characters is stored in the form of arrays. This is also supported in C++ programming. Hence it's called C-strings. C-strings are arrays of type char terminated with null character, that is, \0 (ASCII value of null character is 0).

Does std::string allocate memory?

While std::string has the size of 24 bytes, it allows strings up to 22 bytes(!!) with no allocation.

Where does std::string allocate memory?

The string object itself is stored on the stack but it points to memory that is on the heap.

What is the difference between string and std::string?

There is no functionality difference between string and std::string because they're the same type. That said, there are times where you would prefer std::string over string .


3 Answers

Well, at least with GCC 4.4.5, which is what I have handy on this machine, std::string is a typdef for std::basic_string<char>, and basic_string is defined in /usr/include/c++/4.4.5/bits/basic_string.h. There's a lot of indirection in that file, but what it comes down to is that nonempty std::strings store a pointer to one of these:

  struct _Rep_base
  {
size_type       _M_length;
size_type       _M_capacity;
_Atomic_word        _M_refcount;
  };

Followed in-memory by the actual string data. So std::string is going to have at least three words of overhead for each string, plus any overhead for having a higher capacity than `length (probably not, depending on how you construct your strings -- you can check by asking the capacity() method).

There's also going to be overhead from your memory allocator for doing lots of small allocations; I don't know what GCC uses for C++, but assuming it's similar to the dlmalloc allocator it uses for C, that could be at least two words per allocation, plus some space to align the size to a multiple of at least 8 bytes.

like image 83
nelhage Avatar answered Nov 08 '22 09:11

nelhage


I'm going to guess you are on a 32 bit, 8 bit per byte platform. I'm also going to guess that at least on the gcc version you are using, that they are using a reference counted implementation for std::string. The 4 byte sizeof you see is a pointer to a structure containing the reference count and the string data (and any allocator state if applicable).

In this design of gcc's the only "short" string has size == 0, in which case it can share a representation with every other empty string. Otherwise you get a refcounted COW string.

To investigate this yourself, code up an allocator that keeps track of how much memory it allocates and deallocates, and how many times. Use this allocator to investigate the implementation of the container you're interested in.

like image 34
Howard Hinnant Avatar answered Nov 08 '22 10:11

Howard Hinnant


If it's guaranteed that ">100,000 strings of 4-16 characters each", then don't use std::string. Instead, write your own ShortString class. It's interesting that "sizeof(std::string) == 4", how is that possible? What are sizeof(char) and sizeof(void *)?

like image 3
albert Avatar answered Nov 08 '22 10:11

albert