Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stack memory not released

I have the following loop, which pops a C++ concurrent queue I have, from the implementation here. https://juanchopanzacpp.wordpress.com/2013/02/26/concurrent-queue-c11/

while (!interrupted)
{
    pxData data = queue->pop(); 
    if (data.value == -1)
    { 
        break; // exit loop on terminating condition
     }
    usleep(7000); // stub to simulate processing
}

I am looking at the memory history using System Monitor in CentOS7. I'm trying to free up the memory taken up by the queue, after reading the value from the queue. However, as the following while loop runs, I don't see the memory usage going down. I've verified that the queue length does go down.

It does go down, however, when -1 is encountered and the loop exits. (program is still running) But I can't have this, because where usleep is, I want to do some intensive processing.

Question: Why doesn't the memory occupied by data get free-ed? (according to System Monitor) Isn't the stack allocated memory supposed to be free-ed when the variable goes out of scope?

The struct is defined as follows, and populated at the beginning of the program.

typedef struct pxData
{
  float value; // -1 value terminates the loop
  float x, y, z;
  std::complex<float> valueData[65536];
} pxData;

It's populated with ~10000 pxData, which roughly translates to 5GB. System only has ~8GB. So it's important that the memory is free-ed up for doing other processing in the system.

like image 424
lppier Avatar asked Dec 25 '22 11:12

lppier


1 Answers

There are a few things at play here.

Virtual Memory

First, you need to understand that just because your program is "using" 5 GB of memory does not mean that there are only 3 GB of RAM left for other programs. Virtual memory means that those 5 GB might be only 1 GB of actual "resident" data, and the other 4 GB may actually be on disk rather than in RAM. So it's important to look at the "resident set size" rather than the "virtual size" when you're looking at your program. And note that if your system actually runs low on RAM, the OS may shrink the RSS of some programs by "paging out" some of their memory. So don't worry too much about "5 GB" appearing in the system monitor--worry if you have a real, concrete performance problem.

Heap Allocation

The second aspect is why your virtual size does not decrease as you remove items from the queue. We can guess that you put those elements into the queue by creating them with malloc or new one-by-one, then pushing them onto the back of the queue. This means that the first element you allocated will come out of the queue first. And that in turn means that when you have drained 90% of the queue, your memory allocation might look like this:

[program|------------------unused-------------------|pxData]

The problem here is that in the real world, just because you free or delete something does not mean the operating system instantly reclaims that memory. In fact, it may not be able to reclaim any unused spans unless they are at the "end" (i.e. most recently allocated). Since C++ does not have garbage collection and cannot move items around in memory without your consent, you end up with this big "hole" in your program's virtual memory. That hole would be used to satisfy future memory allocation requests, but if you haven't got any, it just sits there, until the queue is completely empty:

[program|------------------unused--------------------------]

Then the system is able to shrink your virtual address space back down:

[program]

Which brings you back to where you started.

Solutions

If you want to "fix" this, one option is to allocate your memory in "reverse", i.e. put the last items allocated into the front of the queue.

Another option is to allocate the elements for the queue via mmap, which is something that e.g. Linux will do automatically for allocations which are "large." You can change the threshold for this by calling mallopt(3) with M_MMAP_THRESHOLD and setting it to be a little bit smaller than your struct size. This makes the allocations independent of each other, so the OS can reclaim them individually. This technique can even be applied to existing programs without recompilation, so is often useful if you need to solve this problem in a program you cannot modify.

like image 76
John Zwinck Avatar answered Jan 14 '23 16:01

John Zwinck