Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Default size of std::vector / Programming books myth?

in a german programming book(dating to 2012) called "C++ für C-Programmierer"(C++ for C programmers, duh!) which I bought as a reference I found the following section in the chapter about the STL(I'll translate right away for you guys):

Most STL implementations are generous in terms of memory management. The allocation of vectors is mostly done in 1kb chunks for instances. The clipppings do not really matter if one allocates a few vectors, but it does if you create ten- to hundredthousands of them.

I could not find any source confirming that. I know it depends on the implementation, but I could not find anything that confirms that for even one platform. Cplusplus.com merely states:

[...]

Therefore, compared to arrays, vectors consume more memory in exchange for the ability to manage storage and grow dynamically in an efficient way.

What have I tried so far?

I wrote a little C++ program exploiting the OS X specific malloc_size() function, but I have never used it and I am pretty sure I am doing it wrong. If I do something along the lines of:

std::vector<int>* i = new std::vector<int>;
std::cout << malloc_size(i) << std::endl;

Cout merely tells me 32, which may well be the size of an int and therefore prove the author partially wrong, but I am not really convinced by my own efforts.

Does anyone know better or know a resource? Thanks in advance.

Regards, Carson

like image 260
hellerve Avatar asked May 29 '14 16:05

hellerve


2 Answers

Your code doesn't measure what you want it to measure. The vector structure itself is usually quite small. It basically contains a few fields necessary for tracking the allocated memory and a pointer to that memory. What you want to measure is different.

------        -------------------
| i  |------> | A few fields    |
------        | (e.g., size and |
              |  capacity)      |         -------------------
              |-----------------|         | Space allocated |
              |   pointer       |-------> | for elements    |
              -------------------         -------------------
                 ^                            ^
               What your code               What you want to
               measures                     measure

You can probably supply a custom allocator to vector that tracks and reports the size of requested allocations. The GCC 4.8.1 implementation on my computer allocates no memory for a default constructed vector (since it has no elements), and uses the double-size-on-every-growth implementation noted in the comments.

like image 172
T.C. Avatar answered Sep 27 '22 20:09

T.C.


The vector object itself consists of only a few pointers, so the 32-byte size you showed is not surprising, and it won't change over time.

I believe the text of the book is referring to the storage allocated for the contents of the vector. As you add items to the vector, it will allocate space to store them, but that space won't be reflected in malloc_size.

You can figure out how much space the vector has allocated by calling the vector's capacity() method. This will tell you how many items it can hold. If you want the size in bytes, you can multiple the capacity by the size of the element type.

The quoted text talks about 1 KB blocks. Older dynamic containers used linear schemes when they needed to grow. But the runtime complexity requirements that standard places on std::vector doesn't allow for that approach. Instead, a vector must grow by some percentage of its current size.

Many implementations use 100%. So if a vector currently has room for 10 items, and it needs to grow, it'll resize up to 20 items. If it needs to grow even farther, it'll resize up to 40 items, and so on. Thus, in the worst case, you can end up with a vector that has allocated almost twice as much space as would actually be needed. Some implementations use 50%, which still meets runtime complexity requirements without growing quite as fast or "wasting" as much space. (There is at least one other advantage to using a factor less than 100%, but it's not relevant to this discussion.)

On a modern computer with virtual memory, either method is usually fine--the performance will be more important than the unused memory. If you're on an embedded system with limited resources, you might want to exercise more direct control. There are tricks like copy-and-swap that can prune a vector with excess capacity down to a size that's close to the actual need.

like image 36
Adrian McCarthy Avatar answered Sep 27 '22 22:09

Adrian McCarthy