Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to partially free dynamically-allocated memory on a POSIX system?

I have a C++ application where I sometimes require a large buffer of POD types (e.g. an array of 25 billion floats) to be held in memory at once in a contiguous block. This particular memory organization is driven by the fact that the application makes use of some C APIs that operate on the data. Therefore, a different arrangement (such as a list of smaller chunks of memory like std::deque uses) isn't feasible.

The application has an algorithm that is run on the array in a streaming fashion; think something like this:

std::vector<float> buf(<very_large_size>);
for (size_t i = 0; i < buf.size(); ++i) do_algorithm(buf[i]);

This particular algorithm is the conclusion of a pipeline of earlier processing steps that have been applied to the dataset. Therefore, once my algorithm has passed over the i-th element in the array, the application no longer needs it.

In theory, therefore, I could free that memory in order to reduce my application's memory footprint as it chews through the data. However, doing something akin to a realloc() (or a std::vector<T>::shrink_to_fit()) would be inefficient because my application would have to spend its time copying the unconsumed data to the new spot at reallocation time.

My application runs on POSIX-compliant operating systems (e.g. Linux, OS X). Is there any interface by which I could ask the operating system to free only a specified region from the front of the block of memory? This would seem to be the most efficient approach, as I could just notify the memory manager that, for example, the first 2 GB of the memory block can be reclaimed once I'm done with it.

like image 269
Jason R Avatar asked Sep 16 '15 19:09

Jason R


3 Answers

If your entire buffer has to be in memory at once, then you probably will not gain much from freeing it partially later.

The main point on this post is basically to NOT tell you to do what you want to do, because the OS will not unnecessarily keep your application's memory in RAM if it's not actually needed. This is the difference between "resident memory usage" and "virtual memory usage". "Resident" is what is currently used and in RAM, "virtual" is the total memory usage of your application. And as long as your swap partition is large enough, "virtual" memory is pretty much a non-issue. [I'm assuming here that your system will not run out of virtual memory space, which is true in a 64-bit application, as long as you are not using hundreds of terabytes of virtual space!]

If you still want to do that, and want to have some reasonable portability, I would suggest building a "wrapper" that behaves kind of like std::vector and allocates lumps of some megabytes (or perhaps a couple of gigabytes) of memory at a time, and then something like:

 for (size_t i = 0; i < buf.size(); ++i) {
    do_algorithm(buf[i]);
    buf.done(i);
 }

The done method will simply check if the value if i is (one element) past the end of the current buffer, and free it. [This should inline nicely, and produce very little overhead on the average loop - assuming elements are actually used in linear order, of course].

I'd be very surprised if this gains you anything, unless do_algorithm(buf[i]) takes quite some time (certainly many seconds, probably many minutes or even hours). And of course, it's only going to help if you actually have something else useful to do with that memory. And even then, the OS will reclaim memory that isn't actively used by swapping it out to disk, if the system is short of memory.

In other words, if you allocate 100GB, fill it, leave it sitting without touching, it will eventually ALL be on the hard-disk rather than in RAM.

Further, it is not at all unusual that the heap in the application retains freed memory, and that the OS does not get the memory back until the application exits - and certainly, if only parts of a larger allocation is freed, the runtime will not release it until the whole block has been freed. So, as stated at the beginning, I'm not sure how much this will actually help your application.

As with everything regarding "tuning" and "performance improvements", you need to measure and compare a benchmark, and see how much it helps.

like image 90
Mats Petersson Avatar answered Nov 10 '22 15:11

Mats Petersson


Is it possible to partially free dynamically-allocated memory on a POSIX system?

You can not do it using malloc()/realloc()/free().

However, you can do it in a semi-portable way using mmap() and munmap(). The key point is that if you munmap() some page, malloc() can later use that page:

  • create an anonymous mapping using mmap();
  • subsequently call munmap() for regions that you don't need anymore.

The portability issues are:

  • POSIX doesn't specify anonymous mappings. Some systems provide MAP_ANONYMOUS or MAP_ANON flag. Other systems provide special device file that can be mapped for this purpose. Linux provides both.
  • I don't think that POSIX guarantees that when you munmap() a page, malloc() will be able to use it. But I think it'll work an all systems that have mmap()/unmap().

Update

If your memory region is so large that most pages surely will be written to swap, you will not loose anything by using file mappings instead of anonymous mappings. File mappings are specified in POSIX.

like image 42
gavv Avatar answered Nov 10 '22 15:11

gavv


If you can do without the convenience of std::vector (which won't give you much in this case anyway because you'll never want to copy / return / move that beast anyway), you can do your own memory handling. Ask the operating system for entire pages of memory (via mmap) and return them as appropriate (using munmap). You can tell mmap via its fist argument and the optional MAP_FIXED flag to map the page at a particular address (which you must ensure to be not otherwise occupied, of course) so you can build up an area of contiguous memory. If you allocate the entire memory upfront, then this is not an issue and you can do it with a single mmap and let the operating system choose a convenient place to map it. In the end, this is what malloc does internally. For platforms that don't have sys/mman.h, it's not difficult to fall back to using malloc if you can live with the fact that on those platforms, you won't return memory early.

I'm suspecting that if your allocation sizes are always multiples of the page size, realloc will be smart enough not to copy any data. You'd have to try this out and see if it works (or consult your malloc's documentation) on your particular target platform, though.

like image 2
5gon12eder Avatar answered Nov 10 '22 16:11

5gon12eder