Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Malloc on linux without overcommitting

How can I allocate memory on Linux without overcommitting, so that malloc actually returns NULL if no memory is available and the process doesn't randomly crash on access?

My understanding of how malloc works:

  1. The allocator checks the freelist if there is free memory. If yes, the memory is allocated.
  2. If no, new pages are allocated from the kernel. This would be where overcommit can happen. Then the new memory is returned.

So if there is a way to get memory from the kernel that is immediately backed by physical memory, the allocator could use that instead of getting overcommitted pages, and return NULL if the kernel refuses to give more memory.

Is there a way this can be done?

Update:

I understand that this cannot fully protect the process from the OOM killer because it will still be killed in an out of memory situation if it has a bad score, but that is not what I'm worried about.

Update 2: Nominal Animal's comment gave me the following idea of using mlock:

void *malloc_without_overcommit(size_t size) {
    void *pointer = malloc(size);
    if (pointer == NULL) {
        return NULL;
    }
    if (mlock(pointer, size) != 0) {
        free(pointer);
        return NULL;
    }

    return pointer;
}

But this is probably quite slow because of all the system calls, so this should probably be done at the level of the allocator implementation. And also it prevents making use of swap.

Update 3:

New idea, following John Bollingers's comments:

  1. Check if enough memory is available. From what I understand this has to be checked in /proc/meminfo in the MemFree and SwapFree values.
  2. Only if enough space is available (plus an additional safety margin), allocate the memory.
  3. Find out the pagesize with getpagesize and write one byte to the memory every pagesize, so that it gets backed by physical memory (either RAM or swap).

I also looked more closely at mmap(2) and found the following:

MAP_NORESERVE

Do not reserve swap space for this mapping. When swap space is reserved, one has the guarantee that it is possible to modify the mapping. When swap space is not reserved one might get SIGSEGV upon a write if no physical memory is available. See also the discussion of the file /proc/sys/vm/overcommit_memory in proc(5). In kernels before 2.6, this flag only had effect for private writable

Does this imply that mmaping with ~MAP_NORESERVE will completely protect the process from the OOM killer? If so, this would be the perfect solution, as long as there is a malloc implementation, that can work directly on top of mmap. (maybe jemalloc?)

Update 4: My current understanding is that ~MAP_NORESERVE will not protect against the OOM killer but at least against segfaulting on first write to the memory.

like image 571
FSMaxB Avatar asked Feb 02 '18 14:02

FSMaxB


1 Answers

How can I allocate memory on Linux without overcommitting

That is a loaded question, or at least an incorrect one. The question is based on an incorrect assumption, which makes answering the stated question irrelevant at best, misleading at worst.

Memory overcommitment is a system-wide policy -- because it determines how much virtual memory is made available to processes --, and not something a process can decide for itself.

It is up to the system administrator to determine whether memory is overcommitted or not. In Linux, the policy is quite tunable (see e.g. /proc/sys/vm/overcommit_memory in man 5 proc. There is nothing a process can do during allocation that would affect the memory overcommit policy.
 

OP also seems interested in making their processes immune to the out-of-memory killer (OOM killer) in Linux. (OOM killer in Linux is a technique used to relieve memory pressure, by killing processes, and thus releasing their resources back to the system.)

This too is an incorrect approach, because the OOM killer is a heuristic process, whose purpose is not to "punish or kill badly behaving processes", but to keep the system operational. This facility is also quite tunable in Linux, and the system admin can even tune the likelihood of each process being killed in high memory pressure situations. Other than the amount of memory used by a process, it is not up to the process to affect whether the OOM killer will kill it during out-of-memory situations; it too is a policy issue managed by the system administrator, and not the processes themselves.
 

I assumed that the actual question the OP is trying to solve, is how to write Linux applications or services that can dynamically respond to memory pressure, other than just dying (due to SIGSEGV or by the OOM killer). The answer to this is you do not -- you let the system administrator worry about what is important to them, in the workload they have, instead --, unless your application or service is one that uses lots and lots of memory, and is therefore likely to unfairly killed during high memory pressure. (Especially if the dataset is sufficiently large to require enabling much larger amount of swap than would otherwise be enabled, causing a higher risk of a swap storm and late-but-too-strong OOM killer.)

The solution, or at least the approach that works, is to memory-lock the critical parts (or even the entire application/service, if it works on sensitive data that should not be swapped to disk), or to use a memory map with a dedicated backing file. (For the latter, here is an example I wrote in 2011, that manipulates a terabyte-sized data set.)

The OOM killer can still kill the process, and a SIGSEGV still occur (due to say an internal allocation by a library function that the kernel fails to provide RAM backing to), unless all of the application is locked to RAM, but at least the service/process is no longer unfairly targeted, just because it uses lots of memory.

It is possible to catch the SIGSEGV signal (that occurs when there is no memory available to back the virtual memory), but thus far I have not seen an use case that would warrant the code complexity and maintenance effort required.
 

In summary, the proper answer to the stated question is no, don't do that.

like image 77
Nominal Animal Avatar answered Sep 23 '22 14:09

Nominal Animal