One rule every programmer quickly learns about multithreading is:
If more than one thread has access to a data structure, and at least one of threads might modify that data structure, then you'd better serialize all accesses to that data structure, or you're in for a world of debugging pain.
Typically this serialization is done via a mutex -- i.e. a thread that wants to read or write the data structure locks the mutex, does whatever it needs to do, and then unlocks the mutex to make it available again to other threads.
Which brings me to the point: the memory-heap of a process is a data structure which is accessible by multiple threads. Does this mean that every call to default/non-overloaded new
and delete
is serialized by a process-global mutex, and is therefore a potential serialization-bottleneck that can slow down multithreaded programs? Or do modern heap implementations avoid or mitigate that problem somehow, and if so, how do they do it?
(Note: I'm tagging this question linux
, to avoid the correct-but-uninformative "it's implementation-dependent" response, but I'd also be interested in hearing about how Windows and MacOS/X do it as well, if there are significant differences across implementations)
If HEAP_NO_SERIALIZE is not specified (the simple default), the heap serializes access within the calling process. Serialization ensures mutual exclusion when two or more threads attempt simultaneously to allocate or free blocks from the same heap.
To get extended error information, call GetLastError. The HeapCreate function creates a private heap object from which the calling process can allocate memory blocks by using the HeapAlloc function. The initial size determines the number of committed pages that are allocated initially for the heap.
The function reserves space in the virtual address space of the process and allocates physical storage for a specified initial portion of this block. The heap allocation options. These options affect subsequent access to the new heap through calls to the heap functions. This parameter can be 0 or one or more of the following values.
The HeapCreate function creates a private heap object from which the calling process can allocate memory blocks by using the HeapAlloc function. The initial size determines the number of committed pages that are allocated initially for the heap. The maximum size determines the total number of reserved pages.
new
and delete
are thread safe
The following functions are required to be thread-safe:
- The library versions of
operator new
andoperator delete
- User replacement versions of global
operator new
andoperator delete
std::calloc
,std::malloc
,std::realloc
,std::aligned_alloc
,std::free
Calls to these functions that allocate or deallocate a particular unit of storage occur in a single total order, and each such deallocation call happens-before the next allocation (if any) in this order.
With gcc, new
is implemented by delegating to malloc
, and we see that their malloc
does indeed use a lock. If you are worried about your allocation causing bottlenecks, write your own allocator.
Answer is yes, but in practice it is usually not a problem. If it is a problem for you you may try replacing your implementation of malloc with tcmalloc that reduces, but does not eliminate possible contention(since there is only 1 heap that needs to be shared among threads and processes).
TCMalloc assigns each thread a thread-local cache. Small allocations are satisfied from the thread-local cache. Objects are moved from central data structures into a thread-local cache as needed, and periodic garbage collections are used to migrate memory back from a thread-local cache into the central data structures.
There are also other options like using custom allocators and/or specialized containers and/or redesigning your application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With