Does the typical malloc
(for x86-64 platform and Linux OS) naively lock a mutex at the beginning and release it when done, or does it lock a mutex in a more clever way at a finer level, so that lock contention is reduced? If it indeed does it the second way, how does it do it?
malloc() and free() are not thread-safe functions. You need to protect the calls to those functions with a mutex. You need to protect all shared variables with a mutex as well.
None of the common versions of malloc allow you to re-enter it (e.g. from a signal handler). Note that a reentrant routine may not use locks, and almost all malloc versions in existence do use locks (which makes them thread-safe), or global/static variables (which makes them thread-unsafe and non-reentrant).
In C, the library function malloc is used to allocate a block of memory on the heap. The program accesses this block of memory via a pointer that malloc returns. When the memory is no longer needed, the pointer is passed to free which deallocates the memory so that it can be used for other purposes.
So, malloc uses a lock to make the threads "take turns" accessing malloc's internal structures.
glibc 2.15
operates multiple allocation arenas. Each arena has its own lock. When a thread needs to allocate memory, malloc()
picks an arena, locks it, and allocates memory from it.
The mechanism for choosing an arena is somewhat elaborate and is aimed at reducing lock contention:
/* arena_get() acquires an arena and locks the corresponding mutex. First, try the one last locked successfully by this thread. (This is the common case and handled with a macro for speed.) Then, loop once over the circularly linked list of arenas. If no arena is readily available, create a new one. In this latter case, `size' is just a hint as to how much memory will be required immediately in the new arena. */
With this in mind, malloc()
basically looks like this (edited for brevity):
mstate ar_ptr; void *victim; arena_lookup(ar_ptr); arena_lock(ar_ptr, bytes); if(!ar_ptr) return 0; victim = _int_malloc(ar_ptr, bytes); if(!victim) { /* Maybe the failure is due to running out of mmapped areas. */ if(ar_ptr != &main_arena) { (void)mutex_unlock(&ar_ptr->mutex); ar_ptr = &main_arena; (void)mutex_lock(&ar_ptr->mutex); victim = _int_malloc(ar_ptr, bytes); (void)mutex_unlock(&ar_ptr->mutex); } else { /* ... or sbrk() has failed and there is still a chance to mmap() */ ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0, bytes); (void)mutex_unlock(&main_arena.mutex); if(ar_ptr) { victim = _int_malloc(ar_ptr, bytes); (void)mutex_unlock(&ar_ptr->mutex); } } } else (void)mutex_unlock(&ar_ptr->mutex); return victim;
This allocator is called ptmalloc
. It is based on earlier work by Doug Lea, and is maintained by Wolfram Gloger.
Doug Lea's malloc
used coarse locking (or no locking, depending on the configuration settings), where every call to malloc
/realloc
/free
is protected by a global mutex. This is safe but can be inefficient in highly multithreaded environments.
ptmalloc3
, which is the default malloc
implementation in the GNU C library (libc) used on most Linux systems these days, has a more fine-grained strategy, as described in aix's answer, which allows multiple threads to concurrently allocate memory safely.
nedmalloc
is another independent implementation which claims even better multithreaded performance than ptmalloc3
and various other allocators. I don't know how it works, and there doesn't seem to be any obvious documentation, so you'll have to check the source code to see how it works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With