I am currently evaluating a few of scalable memory allocators, namely nedmalloc and ptmalloc (both built on top of dlmalloc), as a replacement for default malloc / new because of significant contention seen in multithreaded environment. Their published performance seems to be good, however I would like to check what are experiences of other people who have really used them.
Allocators handle all the requests for allocation and deallocation of memory for a given container. The C++ Standard Library provides general-purpose allocators that are used by default, however, custom allocators may also be supplied by the programmer.
This allocator provides implementations of the the standard C routines malloc() , free() , and realloc() , as well as a few auxiliary utility routines. The allocator has never been given a specific name. Most people just call it Doug Lea's Malloc, or dlmalloc for short.
Allocators are used by the C++ Standard Library to handle the allocation and deallocation of elements stored in containers. All C++ Standard Library containers except std::array have a template parameter of type allocator<Type> , where Type represents the type of the container element.
I have implemented NedMalloc into our application and I am quite content with the results. The contention I have seen before was gone, and the allocator was quite easy to plug in, even the general performance was very good, up to the point the overhead of memory allocations is out application is now close to unmesurable.
I did not try the ptmalloc, as I did not find a Windows ready version of it and I lost motivation once NedMalloc worked fine for me.
Besides of the two mentioned, I think it could be also interesting to try TCMalloc - it has some features which sound better then NedMalloc in theory (like very little overhead for small allocations, compared to 4 B header used by NedMalloc), however as it does not seem to have Windows port ready, it might also turn to be not exactly easy.
After a few weeks of using NedMalloc I was forced to abandon it, because its space overhead has proven to be too high for us. What hit us in particular was NedMalloc seems to be reclaiming the memory it is no longer used to the OS in a bad manner, keeping most of it still committed. For now I have replaced it with JEMalloc, which seems to be not that fast (it is still fast, but not as fast as NedMalloc was), but it is very robust in this manner and its scalability is also very good.
And after a few months of using JEMalloc I haved switched to TCMalloc. It took more effort to adapt it for Windows compared to the other ones, but its results (both performance and fragmentation) seem to be the best for us of what I have tested so far.
In the past I have needed a very fast method to alloc memory. I found that there wasn't an alloc that was up to the job.
After a couple of days search I came upon boost::pool, which we in our application gave a performance increase of 300x.
We affectivly just call malloc/free on the objects we want to create. Although there is a little setup overhead, with having to malloc a large amount of memory to begin with, but once that is done, this is very fast.
I tried to go your path a while ago when faced with a multi-threaded contention and a severe fragmentation problem. After quite abit of testing I concluded that the benefit of these allocators is negligible in most of the interesting cases I had.
The real solution was to pull my own memory manager which was specialized to the tasks I was doing most often.
If you are on Win32 my experience has been that it's hard to beat the regular Windows heap manager provided you enable Low Fragmentation Heap using the HeapSetInformation API. I believe this is now standard on newer versions of Windows. It handles locking using Interlocked* Win32 primitives rather than more simple Mutex/CritSec locking.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With