Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom heap pre-allocation for entity-component system

I have an OOP entity-component system that currently works like this:

// In the component system
struct Component { virtual void update() = 0; }
struct Entity
{
    bool alive{true};
    vector<unique_ptr<Component>> components;
    void update() { for(const auto& c : components) c->update(); }
}

// In the user application
struct MyComp : Component
{
    void update() override { ... }
}

To create new entities and components, I use C++'s usual new and delete:

// In the component system
struct Manager
{
    vector<unique_ptr<Entity>> entities;
    Entity& createEntity() 
    { 
        auto result(new Entity);
        entities.emplace_back(result);
        return *result;
    }
    template<typename TComp, typename... TArgs>
        TComp& createComponent(Entity& mEntity, TArgs... mArgs)
    {
        auto result(new TComp(forward<TArgs>(mArgs)...));
        mEntity.components.emplace_back(result);
        return result;
    }
    void removeDead() { /* remove all entities with 'alive == false' - 'delete' is called here by the 'unique_ptr' */ }
}

// In the user application
{
    Manager m;
    auto& myEntity(m.createEntity());
    auto& myComp(m.createComponent<MyComp>(myEntity));
    // Do stuff with myEntity and myComp
    m.removeDead();
}

The system works fine, and I like the syntax and flexibility. However, when continuously adding and removing entities and components to the manager, memory allocation/deallocation slows down the application. (I've profiled and determined that the slow down is caused by new and delete).

I've recently read that it's possible to pre-allocate heap memory in C++ - how can that be applied to my situation?


Desired result:

// In the user application
{
    Manager m{1000}; 
    // This manager can hold about 1000 entities with components 
    // (may not be 1000 because of dynamic component size, 
    // since the user can define it's on components, but it's ok for me)

    auto& myEntity(m.createEntity());
    auto& myComp(m.createComponent<MyComp>(myEntity));
    // Do stuff with myEntity and myComp

    m.removeDead(); 
    // No 'delete' is called here! Memory of the 'dead' entities can
    // be reused for new entity creation
}
// Manager goes out of scope: 'delete' is called here  
like image 234
Vittorio Romeo Avatar asked Jul 22 '13 14:07

Vittorio Romeo


3 Answers

There are a few things you can do to make the implementation of your design scale better.

In your current implementation there are two memory allocations per Entity and Component. The first one allocates an object and the second one when the object is put into the vector. The second one happens when the vector runs out of space and allocates a bigger array and moves the old elements into the new array.

In this case the best you can do is to use intrusive lists. That is, each of Entity and Component become also list nodes. Then, after these have been allocated no extra memory allocations are necessary to put the object into a list. Use a single or double-linked list from Boost.Intrusive, or write your own. This is how Linux kernel keeps track of many different objects.

The next step is to preallocate Entity and Component elements. Preallocating could be something as simple as a global array of these, or something more sophisticated, such as Boost.Pool. There are quite a few ways to build a memory pool of objects.

Once Entity and Component are preallocated and intrusive lists are used you are done.

An example which uses boost components:

#include <boost/intrusive/list.hpp>
#include <boost/pool/pool_alloc.hpp>
#include <new>

namespace bi = boost::intrusive;

// api.h

//
// Object pooling support begin.
//
template<class T>
struct Pool
{
    static boost::pool_allocator<T> pool;
};

// Singleton. Although it is defined in the header, the linkers
// make sure there is only one instance of it in the application.
// It is instantiated on demand when Pool<T> is used.
template<class T>
boost::pool_allocator<T> Pool<T>::pool;

template<class Derived>
struct Pooled // use it on the most derived class only, not on intermediate base classes
{
    // Automatically use the object pool for plain new/delete.
    static void* operator new(size_t) { return Pool<Derived>::pool.allocate(1); }
    static void operator delete(void* p) { return Pool<Derived>::pool.deallocate(static_cast<Derived*>(p), 1); }
};
//
// Object pooling support end.
//

// Using bi::list_base_hook<bi::link_mode<bi::auto_unlink> > because it automatically
// unlinks from the list when the object is destroyed. No need to manually
// remove the object from the list when an object is about to be destroyed.

struct Component
    : bi::list_base_hook<bi::link_mode<bi::auto_unlink> > // make it an intrusive list node
{
    virtual void update() = 0;
    virtual ~Component() {}
};

struct Entity
    : bi::list_base_hook<bi::link_mode<bi::auto_unlink> > // make it an intrusive list node
    , Pooled<Entity> // optional, make it allocated from the pool
{
    bool active = false;

    bi::list<Component, bi::constant_time_size<false> > components;

    ~Entity() {
        for(auto i = components.begin(), j = components.end(); i != j;)
            delete &*i++; // i++ to make sure i stays valid after the object is destroyed
    }

    void update() {
        for(auto& c : components)
            c.update();
    }
};

struct Manager
{
    bi::list<Entity, bi::constant_time_size<false> > entities;

    ~Manager() {
        for(auto i = entities.begin(), j = entities.end(); i != j;)
            delete &*i++; // i++ to make sure i stays valid after the object is destroyed
    }

    Entity& createEntity() {
        auto result = new Entity;
        entities.push_back(*result);
        return *result;
    }

    template<typename TComp, typename... TArgs>
    TComp& createComponent(Entity& mEntity, TArgs... mArgs)
    {
        auto result = new TComp(std::forward<TArgs>(mArgs)...);
        mEntity.components.push_back(*result);
        return *result;
    }

    void removeDead() {
        for(auto i = entities.begin(), j = entities.end(); i != j;) {
            auto& entity = *i++;
            if(!entity.active)
                delete &entity;
        }
    }
};

// user.cc
struct MyComp
    : Component
    , Pooled<MyComp> // optional, make it allocated from the pool
{
    void update() override {}
};

int main() {
    Manager m;
    auto& myEntity(m.createEntity());
    auto& myComp(m.createComponent<MyComp>(myEntity));
    m.removeDead();
}

In the above example boost::pool_allocator<T> actually uses new to allocate objects and then it keeps reusing destroyed objects rather than invoking delete on them. You can do better by preallocating all objects, but there are many ways to do so depending on your requirements, so that I use boost::pool_allocator<T> for simplicity to avoid hair splitting here. You can change the implementation of Pooled<T> to something like Pooled<T, N> where N stands for the maximum number of objects, the rest of the code stays the same because it uses plain new/delete which happen to be overridden for objects allocated from a pool.

like image 174
Maxim Egorushkin Avatar answered Nov 02 '22 15:11

Maxim Egorushkin


C++ supports class-specific memory pools for this kind of thing. The general-purpose new/delete pair inevitably trades off among

  • Time spent searching for a free block of the right size to meet each request
  • Time spent coalescing free blocks
  • Time spent maintaining and perhaps reorganizing internal data structures to make the above two operations faster.

The primary way to gain speed is to avoid these tradeoffs entirely with custom allocators that - as you say - pre-allocate a big chunk of memory viewed as a simple array of free objects all of the same size. Initially these are all linked on a free list, where the link pointers occupy the first bytes of each block "overlaid" where the data will eventually go. Allocation is just unchaining a block from the head of the free list - a "pop" operation needing about 2 instructions. Deallocation is a "push:" two more instructions. In many cases, memory hardware can be set to generate a trap when the the pool is empty so there is no per-allocation overhead for detecting this error condition. (In GC systems the same trick is used to initiate collection with no overhead.)

In your case you'd need two pools: one for Entities and one for Components.

Defining your own pool allocator is not so hard, especially if your application is single threaded. See this document for a tutorial treatment.

like image 1
Gene Avatar answered Nov 02 '22 16:11

Gene


Using most of answers and Google as references, I implemented some pre-allocation utilities in my SSVUtils library.

Prealloc.h

Example:

using MemUnit = char;
using MemUnitPtr = MemUnit*;
using MemSize = decltype(sizeof(MemUnit)); // Should always be 1 byte

class MemBuffer
{
    Uptr<MemUnit[]> buffer;
    MemRange range;

    MemBuffer(MemSize mSize) : ... 
    { 
        // initialize buffer from mSize
    }
};

class PreAllocatorChunk
{
    protected:
        MemSize chunkSize;
        MemBuffer buffer;
        std::stack<MemRange> available;

    public:
        PreAllocatorChunk(MemSize mChunkSize, unsigned int mChunks) : ...
        {
            // Add "chunks" to to available...
        }

        template<typename T, typename... TArgs> T* create(TArgs&&... mArgs)
        {
            // create on first "chunk" using placement new
            auto toUse(available.top().begin); available.pop();
            return new (toUse) T{std::forward<TArgs>(mArgs)...};
        }
};

More pre-allocation utilities are available:

  • PreAllocatorDynamic: pre-allocates a big buffer, then, when creating an object, splits the buffer in two parts:

    • [buffer start, buffer start + obj size)
    • [buffer start + obj size, buffer end)

    When an object is destroyed, its occupied memory range is set as "available". If during creation of a new object no big enough "chunk" is found, the pre-allocator tries to unify contiguous memory chunks before throwing a runtime exception. This pre-allocator is sometimes faster than new/delete, but it greatly depends on the size of pre-allocated buffer.

  • PreAllocatorStatic<T>: inherited from PreAllocatorChunk. Size of a chunk is equal to sizeof(T). Fastest pre-allocator, less flexible. Almost always faster than new/delete.

like image 1
Vittorio Romeo Avatar answered Nov 02 '22 16:11

Vittorio Romeo