Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linux Memory Usage in top when using std::vector versus std::list

I have noticed some interesting behavior in Linux with regard to the Memory Usage (RES) reported by top. I have attached the following program which allocates a couple million objects on the heap, each of which has a buffer that is around 1 kilobyte. The pointers to those objects are tracked by either a std::list, or a std::vector. The interesting behavior I have noticed is that if I use a std::list, the Memory Usage reported by top never changes during the sleep periods. However if I use std::vector, the memory usage will drop to near 0 during those sleeps.

My test configuration is:
Fedora Core 16
Kernel 3.6.7-4
g++ version 4.6.3

What I already know:
1. std::vector will re-allocate (doubling its size) as needed.
2. std::list (I beleive) is allocating its elements 1 at a time
3. both std::vector and std::list are using std::allocator by default to get their actual memory
4. The program is not leaking; valgrind has declared that no leaks are possible.

What I'm confused by:
1. Both std::vector and std::list are using std::allocator. Even if std::vector is doing batch re-allocations, wouldn't std::allocator be handing out memory in almost the same arrangement to std::list and std::vector? This program is single threaded after all.
2. Where can I learn about the behavior of Linux's memory allocation. I have heard statements about Linux keeping RAM assigned to a process even after it frees it, but I don't know if that behavior is guaranteed. Why does using std::vector impact that behavior so much?

Many thanks for reading this; I know this is a pretty fuzzy problem. The 'answer' I'm looking for here is if this behavior is 'defined' and where I can find its documentation.

#include <string.h>
#include <unistd.h>
#include <iostream>
#include <vector>
#include <list>
#include <iostream>
#include <memory>

class Foo{
public:
    Foo()
    {
        data = new char[999];
        memset(data, 'x', 999);
    }

    ~Foo()
    {
        delete[] data;
    }

private:
    char* data;

};

int main(int argc, char** argv)
{
    for(int x=0; x<10; ++x)
    {
        sleep(1);
        //std::auto_ptr<std::list<Foo*> > foos(new std::list<Foo*>);
        std::auto_ptr<std::vector<Foo*> > foos(new std::vector<Foo*>);
        for(int i=0; i<2000000; ++i)
        {
            foos->push_back(new Foo());
        }
        std::cout << "Sleeping before de-alloc\n";
        sleep(5);
        while(false == foos->empty())
        {
            delete foos->back();
            foos->pop_back();
        }
    }
    std::cout << "Sleeping after final de-alloc\n";
    sleep(5);
}
like image 533
user1940267 Avatar asked Dec 31 '12 21:12

user1940267


3 Answers

The freeing of memory is done on a "chunk" basis. It's quite possible that when you use list, the memory gets fragmented into little tiny bits.

When you allocate using a vector, all elements are stored in one big chunk, so it's easy for the memory freeing code to say "Golly, i've got a very large free region here, I'm going to release it back to the OS". It's also entirely possible that when growing the vector, the memory allocator goes into "large chunk mode", which uses a different allocation method than "small chunk mode" - say for example you allocate more than 1MB, the memory allocation code may see that as a good time to start using a different strategy, and just ask the OS for a "perfect fit" piece of memory. This large block is very easy to release back to he OS when it's being freed.

On the ohter hand if you are adding to a list, you are constantly asking for little bits, so the allocator uses a different strategy of asking for large block and then giving out small portions. It's both difficult and time-consuming to ensure that ALL blocks within a chunk have been freed, so the allocator may well "not bother" - because chances are that there are some regions in there "still in use", and then it can't be freed at all anyways.

I would also add that using "top" as a memory measure isn't a particularly accurate method, and is very unreliable, as it very much depends on what the OS and the runtime library does. Memory belonging to a process may not be "resident", but the process still hasn't freed it - it's just not "present in actual memory" (out in the swap partition instead!)

And to your question "is this defined somewhere", I think it is in the sense that the C/C++ library source cod defines it. But it's not defined in the sense that somewhere it's written that "This is how it's meant to work, and we promise never to hange it". The libraries supplied as glibc and libstdc++ are not going to say that, they will change the internals of malloc, free, new and delete as new technologies and ideas are invented - some may make things better, others may make it worse, for a given scenario.

As has been pointed out in the comments, the memory is not locked to the process. If the kernel feels that the memory is better used for something else [and the kernel is omnipotent here], then it will "steal" the memory from one running process and give it to another. Particularly memory that hasn't been "touched" for a long time.

like image 169
Mats Petersson Avatar answered Oct 08 '22 13:10

Mats Petersson


1 . Both std::vector and std::list are using std::allocator. Even if std::vector is doing batch re-allocations, wouldn't std::allocator be handing out memory in almost the same arrangement to std::list and std::vector? This program is single threaded after all.

Well, what are the differences?

  • std::list allocates nodes one-by-one (each node needs two pointers in addition to your Foo *). Also, it never re-allocates these nodes (this is guaranteed by the iterator invalidation requirements for list). So, the std::allocator will request a sequence of fixed-size chunks from the underlying mechanism (probably malloc which will in turn use the sbrk or mmap system calls). These fixed-size chunks may well be larger than a list node, but if so they'll all be the same default chunk size used by std::allocator.

  • std::vector allocates a contiguous block of pointers with no book-keeping overhead (that's all in the vector parent object). Every time a push_back would overflow the current allocation, the vector will allocate a new, larger chunk, move everything across to the new chunk, and release the old one. Now, the new chunk will be something like double (or 1.6 times, or whatever) the size of the old one, as is required to keep the amortized constant time guarantee for push_back. So, pretty quickly, I'd expect the sizes it requests to exceed any sensible default chunk size for std::allocator.

So, the the interesting interactions are different: one between between std::vector and the allocator's underlying mechanism, and one between the std::allocator itself and that underlying mechanism.

2 . Where can I learn about the behavior of Linux's memory allocation. I have heard statements about Linux keeping RAM assigned to a process even after it frees it, but I don't know if that behavior is guaranteed. Why does using std::vector impact that behavior so much?

There are several levels you might care about:

  1. The container's own allocation pattern: which is hopefully described above
    • note that in real-world applications, the way a container is used is just as important
  2. std::allocator itself, which may provide a layer of buffering for small allocations
    • I don't think this is required by the standard, so it's specific to your implementation
  3. The underlying allocator, which depends on your std::allocator implementation (it could for example be malloc, however that is implemented by your libc)
  4. The VM scheme used by the kernel, and its interactions with whatever syscall (3) ultimately uses

In your particular case, I can think of a possible explanation for the vector apparently releasing more memory than the list.

Consider that the vector ends up with a single contiguous allocation, and lots of the Foos will also be allocated contiguously. This means that when you release all this memory, it's pretty easy to figure out that most of the underlying pages are genuinely free.

Now consider that the list node allocations are interleaved 1:1 with the Foo instances. Even if the allocator did some batching, it seems likely that the heap is much more fragmented than in the std::vector case. Therefore, when you release the allocated records, some work would be required to figure out whether an underlying page is now free, and there's no particular reason to expect this will happen (unless a subsequent large allocation encouraged coalescing of heap records).

like image 20
Useless Avatar answered Oct 08 '22 13:10

Useless


The answer is the malloc "fastbins" optimization. std::list creates tiny (less then 64 bytes) allocations and when it frees them up they are not actually freed - but goes to the fastblock pool. This behavior means that the heap stays fragmented even AFTER the list is cleared and therefore it does not return to the system.

You can either use malloc_trim(128*1024) in order to forcibly clear them. Or use mallopt(M_MXFAST, 0) in order to disable fastbins altogether.

I find the first solution to be more correct if you call it when you really don't need the memory anymore.

like image 1
Offer Avatar answered Oct 08 '22 11:10

Offer