Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using multiple allocators efficiently

I have been researching switching my allocation method from simpling overloading new to using multiple allocators through the code base. However, how can I efficiently use multiple allocators? The only way I could devise through my research was having the allocators be globals. Although, this seemed to have issues since it is typically a "bad idea" to have the use of many globals.

I am looking to find out how to use multiple allocators efficiently. For example, I may have one allocator use only for a particular subsystem, and a different allocator for a different subsystem. I am not sure if the only way to do this is through using multiple global allocators, so I am hoping for a better insight and design.

like image 771
chadb Avatar asked Mar 10 '12 21:03

chadb


People also ask

What is the use of allocator?

Allocators handle all the requests for allocation and deallocation of memory for a given container. The C++ Standard Library provides general-purpose allocators that are used by default, however, custom allocators may also be supplied by the programmer.

Why is malloc slow?

The main reason why malloc() is rather slow is that it is providing a lot of functionality – the allocation of chunks of memory of variable size is somewhat complex.

What is an STL allocator?

An SGI STL allocator consists of a class with 3 required member functions, all of which are static: void * allocate(size_t n) Allocates an object with the indicated size (in bytes). The object must be sufficiently aligned for any object of the requested size.


3 Answers

In C++2003 the allocator model is broken and there isn't really a proper solution. For C++2011 the allocator model was fixed and you can have per instance allocators which are propagated down to contained objects (unless, of course, you choose to replace them). Generally, for this to be useful you probably want to use a dynamically polymorphic allocator type which the default std::allocator<T> is not required to be (and generally I would expect it not to be dynamically polymorphic although this may be the better implementation choice). However, [nearly] all classes in the standard C++ library which do memory allocation are templates which take the allocator type as template argument (e.g. the IOStreams are an exception but generally they don't allocate any interesting amount of memory to warrant adding allocator support).

In several of your comments you are insisting that allocators effectively need to be global: that is definitely not correct. Each allocator-aware type stores a copy of the allocator given (at least, if it has any instance level data; if it doesn't there isn't anything to store as is e.g. the case with the default allocator using operator new() and operator delete()). This effectively means that the allocation mechanism given to an object needs to stick around as long as there is any active allocator using it. This can be done using a global object but it can also be done using e.g. reference counting or associating the allocator with an object containing all objects to which it is given. For example, if each "document" (think XML, Excel, Pages, whatever structure file) passes an allocator to its members, the allocator can live as member of the document and get destroyed when the document is destroyed after all its content is destroyed. This part of the allocator model should work with pre-C++2011 classes, as long as they take an allocator argument, as well. However, in pre-C++2011 classes the allocator won't be passed to contained objects. For example, if you give an allocator to a std::vector<std::string> the C++2011 version will create the std::strings using the allocator given to the std::vector<std::string> appropriately converted to deal with std::strings. This won't happen with pre-C++2011 allocators.

To actually use allocators in a subsystem you will effectively need to pass them around, either explicitly as an argument to your functions and/or classes or implicitly by way of allocator-aware objects which serve as a context. For example, if you use any of the standard containers as [part of] the context passed around, you can obtain the used allocator using its get_allocator() method.

like image 141
Dietmar Kühl Avatar answered Oct 06 '22 06:10

Dietmar Kühl


You can use new placement. This can be used either to specify a memory region, or to overload the type's static void* operator new(ARGS). Globals are not required, and really a bad idea here, if efficiency is important and your problems are demanding. You would need to hold on to one or more allocators, of course.

The best thing you can do is understand your problems and create strategies for your allocators based on the patterns in your program and on actual usage. The general purpose malloc is very good at what it does, so always use that as one baseline to measure against. If you don't know your usage patterns, your allocator will likely be slower than malloc.

Also keep in mind that these types you use will lose compatability with standard containers, unless you use a global or thread local and custom allocator for standard containers -- which quickly defeats the purpose in many contexts. The alternative is to also write your own allocators and containers.

like image 43
justin Avatar answered Oct 06 '22 04:10

justin


Some uses for multiple allocators include reduced CPU usage, reduced fragmentation, and fewer cache misses. So the solution really depends on what type and where your allocation bottleneck is.

CPU usage will be improved by having lockless heaps for active threads, eliminating synchronization. This can be done in your memory allocator with thread local storage.

Fragmentation will be improved by having allocations with different lifespans be allocated from different heaps -- allocating background IO in a separate heap from the users active task will ensure the two do not confound one another. This is likely done by having a stack for your heaps, and push/popping when you're in different functional scopes.

Cache misses will be improved by keeping allocations within a system together. Having Quadtree/Octree allocations come from their own heap will guarantee there is locality in view frustrum queries. This is best done by overloading operator new and operator delete for the specific classes (OctreeNode).

like image 39
Roger Hanna Avatar answered Oct 06 '22 05:10

Roger Hanna