I've read on a C++ optimization cookbook that the standard allocator for STL containers such as std::list, std::set, std::multi_set, std::map, e std::multi_map can be replaced by a more performant block allocator.
A block allocator has higher performance, low fragmentation and efficient data caching.
I've found on the web the FSBAllocator which claims to be faster than the standard. http://warp.povusers.org/FSBAllocator/
I've tried it with std::map and found the seems to be faster indeed, but my question is how can be the STL implementation be so slower than a specific allocator and what are the drawbacks of another allocator than the standard, in terms of both portability and robustness? My code must compile on a variety of architectures (win32, osx, linux). Do someone had experience with that kind of fixed size block allocator?
A block allocator does one large allocation to the free store/heap, it then internally splits this memory into chunks. One drawback here is that it allocates this chunk ( which needs to be large, and often user specified on a per use case basis ) straight up, so even if you don't use all of it, that memory is tied up. Secondly the standard memory allocator is built on top of new / delete, which in turn is often built on top of malloc / free. Although I don't recall whether malloc / free is guaranteed to be thread safe under all circumstances, it usually is.
But lastly the reason for why block allocators work so well is because they have information that is not present to the standard allocators, and they don't need to cover a very wide set of use cases. For example, if you did std::map< int, int >()
and it allocated 1mb you'd probably be pissed, but if you do std::map< int, int, std::less< int >, block_alloc< 1024 * 1024 > >()
you'd be expecting it. The standard allocators doesn't allocate in blocks, they request new memory via new, new in turn have no context at all. It gets a memory request of an arbitrary size and needs to find a contiguous number of bytes to return. What most implementations does is that they have a set of memory areas which they maintain of different multiples ( for example, a region for 4 bytes is more or less probably guaranteed to be present, as requests for 4 bytes is very common ). If a request isn't an even multiple it gets harder to return a good chunk without wasting space and cause fragmentation. Basically memory management of arbitrary sizes is very very hard to do if you want it to be close to constant time, low fragmentation, thread safe etc.
The Boost pool allocator documentation has some good info on how a good block allocator can work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With