I have noticed some interesting behavior in Linux with regard to the Memory Usage (RES) reported by top
. I have attached the following program which allocates a couple million objects on the heap, each of which has a buffer that is around 1 kilobyte. The pointers to those objects are tracked by either a std::list
, or a std::vector
. The interesting behavior I have noticed is that if I use a std::list
, the Memory Usage reported by top
never changes during the sleep periods. However if I use std::vector
, the memory usage will drop to near 0 during those sleeps.
My test configuration is:
Fedora Core 16
Kernel 3.6.7-4
g++ version 4.6.3
What I already know:
1. std::vector will re-allocate (doubling its size) as needed.
2. std::list (I beleive) is allocating its elements 1 at a time
3. both std::vector and std::list are using std::allocator by default to get their actual memory
4. The program is not leaking; valgrind has declared that no leaks are possible.
What I'm confused by:
1. Both std::vector and std::list are using std::allocator. Even if std::vector is doing batch re-allocations, wouldn't std::allocator be handing out memory in almost the same arrangement to std::list and std::vector? This program is single threaded after all.
2. Where can I learn about the behavior of Linux's memory allocation. I have heard statements about Linux keeping RAM assigned to a process even after it frees it, but I don't know if that behavior is guaranteed. Why does using std::vector impact that behavior so much?
Many thanks for reading this; I know this is a pretty fuzzy problem. The 'answer' I'm looking for here is if this behavior is 'defined' and where I can find its documentation.
#include <string.h>
#include <unistd.h>
#include <iostream>
#include <vector>
#include <list>
#include <iostream>
#include <memory>
class Foo{
public:
Foo()
{
data = new char[999];
memset(data, 'x', 999);
}
~Foo()
{
delete[] data;
}
private:
char* data;
};
int main(int argc, char** argv)
{
for(int x=0; x<10; ++x)
{
sleep(1);
//std::auto_ptr<std::list<Foo*> > foos(new std::list<Foo*>);
std::auto_ptr<std::vector<Foo*> > foos(new std::vector<Foo*>);
for(int i=0; i<2000000; ++i)
{
foos->push_back(new Foo());
}
std::cout << "Sleeping before de-alloc\n";
sleep(5);
while(false == foos->empty())
{
delete foos->back();
foos->pop_back();
}
}
std::cout << "Sleeping after final de-alloc\n";
sleep(5);
}
The freeing of memory is done on a "chunk" basis. It's quite possible that when you use list, the memory gets fragmented into little tiny bits.
When you allocate using a vector, all elements are stored in one big chunk, so it's easy for the memory freeing code to say "Golly, i've got a very large free region here, I'm going to release it back to the OS". It's also entirely possible that when growing the vector, the memory allocator goes into "large chunk mode", which uses a different allocation method than "small chunk mode" - say for example you allocate more than 1MB, the memory allocation code may see that as a good time to start using a different strategy, and just ask the OS for a "perfect fit" piece of memory. This large block is very easy to release back to he OS when it's being freed.
On the ohter hand if you are adding to a list, you are constantly asking for little bits, so the allocator uses a different strategy of asking for large block and then giving out small portions. It's both difficult and time-consuming to ensure that ALL blocks within a chunk have been freed, so the allocator may well "not bother" - because chances are that there are some regions in there "still in use", and then it can't be freed at all anyways.
I would also add that using "top" as a memory measure isn't a particularly accurate method, and is very unreliable, as it very much depends on what the OS and the runtime library does. Memory belonging to a process may not be "resident", but the process still hasn't freed it - it's just not "present in actual memory" (out in the swap partition instead!)
And to your question "is this defined somewhere", I think it is in the sense that the C/C++ library source cod defines it. But it's not defined in the sense that somewhere it's written that "This is how it's meant to work, and we promise never to hange it". The libraries supplied as glibc and libstdc++ are not going to say that, they will change the internals of malloc, free, new and delete as new technologies and ideas are invented - some may make things better, others may make it worse, for a given scenario.
As has been pointed out in the comments, the memory is not locked to the process. If the kernel feels that the memory is better used for something else [and the kernel is omnipotent here], then it will "steal" the memory from one running process and give it to another. Particularly memory that hasn't been "touched" for a long time.
1 . Both std::vector and std::list are using std::allocator. Even if std::vector is doing batch re-allocations, wouldn't std::allocator be handing out memory in almost the same arrangement to std::list and std::vector? This program is single threaded after all.
Well, what are the differences?
std::list
allocates nodes one-by-one (each node needs two pointers in addition to your Foo *
). Also, it never re-allocates these nodes (this is guaranteed by the iterator invalidation requirements for list
). So, the std::allocator
will request a sequence of fixed-size chunks from the underlying mechanism (probably malloc
which will in turn use the sbrk
or mmap
system calls). These fixed-size chunks may well be larger than a list node, but if so they'll all be the same default chunk size used by std::allocator
.
std::vector
allocates a contiguous block of pointers with no book-keeping overhead (that's all in the vector parent object). Every time a push_back
would overflow the current allocation, the vector will allocate a new, larger chunk, move everything across to the new chunk, and release the old one. Now, the new chunk will be something like double (or 1.6 times, or whatever) the size of the old one, as is required to keep the amortized constant time guarantee for push_back
. So, pretty quickly, I'd expect the sizes it requests to exceed any sensible default chunk size for std::allocator
.
So, the the interesting interactions are different: one between between std::vector
and the allocator's underlying mechanism, and one between the std::allocator
itself and that underlying mechanism.
2 . Where can I learn about the behavior of Linux's memory allocation. I have heard statements about Linux keeping RAM assigned to a process even after it frees it, but I don't know if that behavior is guaranteed. Why does using std::vector impact that behavior so much?
There are several levels you might care about:
std::allocator
itself, which may provide a layer of buffering for small allocations
std::allocator
implementation (it could for example be malloc
, however that is implemented by your libc)In your particular case, I can think of a possible explanation for the vector apparently releasing more memory than the list.
Consider that the vector ends up with a single contiguous allocation, and lots of the Foo
s will also be allocated contiguously. This means that when you release all this memory, it's pretty easy to figure out that most of the underlying pages are genuinely free.
Now consider that the list node allocations are interleaved 1:1 with the Foo
instances. Even if the allocator did some batching, it seems likely that the heap is much more fragmented than in the std::vector
case. Therefore, when you release the allocated records, some work would be required to figure out whether an underlying page is now free, and there's no particular reason to expect this will happen (unless a subsequent large allocation encouraged coalescing of heap records).
The answer is the malloc "fastbins" optimization. std::list creates tiny (less then 64 bytes) allocations and when it frees them up they are not actually freed - but goes to the fastblock pool. This behavior means that the heap stays fragmented even AFTER the list is cleared and therefore it does not return to the system.
You can either use malloc_trim(128*1024) in order to forcibly clear them. Or use mallopt(M_MXFAST, 0) in order to disable fastbins altogether.
I find the first solution to be more correct if you call it when you really don't need the memory anymore.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With