Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should a multi-threaded C application handle a failed malloc()?

Tags:

c

malloc

pthreads

A part of an application I'm working on is a simple pthread-based server that communicates over a TCP/IP socket. I am writing it in C because it's going to be running in a memory constrained environment. My question is: what should the program do if one of the threads encounters a malloc() that returns NULL? Possibilities I've come up with so far:

  1. No special handling. Let malloc() return NULL and let it be dereferenced so that the whole thing segfaults.
  2. Exit immediately on a failed malloc(), by calling abort() or exit(-1). Assume that the environment will clean everything up.
  3. Jump out of the main event loop and attempt to pthread_join() all the threads, then shut down.

The first option is obviously the easiest, but seems very wrong. The second one also seems wrong since I don't know exactly what will happen. The third option seems tempting except for two issues: first, all of the threads need not be joined back to the main thread under normal circumstances and second, in order to complete the thread execution, most of the remaining threads will have to call malloc() again anyway.

What shall I do?

like image 409
ipartola Avatar asked May 14 '10 11:05

ipartola


3 Answers

This is one of the reason that space / rad hard systems generally forbid dynamic memory allocation. When malloc() fails, its extremely hard to 'cure' the failure. You do have some options:

  • You are not required to use the built in libc malloc() (at all, or as usual). You can wrap malloc() to do extra work on failures, such as notifying something else. This is helpful when using something like a watchdog. You can also use a full blown garbage collector, though I don't recommend it. Its better to identify and fix leaks.
  • Depending on storage and complexity, infrequently accessed allocated blocks could be mapped to disk. But here, typically, you are only looking at a few KB of savings in physical memory.
  • You can use a static pool of memory and your own malloc() that won't oversell it. If you have profiled your heap usage extensively (using a tool like Valgrind's massif or similar), you can reasonably size the pool.

However, what most of those suggestions boil down to is not trusting / using the system malloc() if failure is not an option.

In your case, I think the best thing you can do is make sure a watchdog is notified in the event that malloc() fails, so that your process (or the whole system) can be re-started. You don't want it looking 'alive and running' while in deadlock. This could be as simple as just unlinking a file.

Write very detailed logs. What file / line / function did the failure happen?

If malloc() fails when trying to get just a few KB, its a good sign that your process really can't continue reliably anyway. If it fails grabbing a few hundred MB, you may be able to recover and keep going. By that token, whatever action you take should be based on just how much memory you were trying to get, and if calls to allocate a much smaller size still succeed.

The one thing you never want to do is just operate on NULL pointers and let it crash. Its just sloppy, provides no useful logging of where things went wrong and gives the impression that your software is of low / unstable quality.

like image 200
Tim Post Avatar answered Nov 09 '22 19:11

Tim Post


There's nothing wrong with option 2. You don't have to assume - exit() exits the process, which means all the threads are torn down and everything is cleaned up.

Don't forget to try and log where the failed allocation occured.

like image 39
caf Avatar answered Nov 09 '22 19:11

caf


There's a fourth option: free some memory (caches are always good candidates) and try again.

If you cannot afford this, I'd choose option 2 (logging or printing some kind of error message, obviously)... The only concern about cleanup would be closing the opened network connections in an orderly manner, so the clients know that the application on the other side is shutting down rather than find an unexpected connectivity problem.

like image 22
fortran Avatar answered Nov 09 '22 19:11

fortran