Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any better alternative to std::vector<std::unique_ptr<T>>?

I'm looking for a container (targeted towards game-development, especially for entity management) that needs to satisfy these requirements:

  1. Fast iteration
  2. No copies of stored elements
  3. No invalidation of pointers to the elements
  4. Removal and insertion of elements

Example:

Container<Entity> container;

// This pointer will always point to the player
Entity* player{new Entity};          
container.add(player);               

// Set some entities to "dead"
for(auto& e : container) if(e->type == "Enemy") e->die(); 

// Use erase-remove idiom on "dead" entities
container.cleanup();                 

// Player pointer is still valid
player->doSomething();               

I've tried two different container types so far:

  • std::vector<std::unique_ptr<T>>
    1. Cache-friendly (fast iteration)
    2. No copies (thanks to std::unique_ptr)
    3. No invalidation of pointers (thanks to std::unique_ptr)

...and...

  • std::list<T>
    1. Non cache-friendly (slower iteration)
    2. No copies
    3. No invalidation of pointers

Even if it seems counter-intuitive, std::vector<std::unique_ptr<T>> is more performant than std::list<T> according to my benchmarks.

(For bigger types, std::list<T> is more performant during insertion, but std::vector<std::unique_ptr<T>> still wins).


I was wondering if there is a better alternative to std::vector<std::unique_ptr<T>>.

Ideally, the alternative should be cache-friendly, for fast iteration, and allow the user to refer to the same items even after adding/removing existing items (pointers should not be invalidated).

like image 842
Vittorio Romeo Avatar asked Dec 23 '13 15:12

Vittorio Romeo


2 Answers

You are doing the right thing by performance testing. That is the only true way to answer this question.

The only thing I know of that might be faster is to to create a buffer. Then create a custom allocator for the vector<unique_ptr<T>, custom_allocator<unique_ptr<T>>> which allocates from your buffer.

Also allocate your objects from the same buffer (such that the unique_ptr's point into the buffer).

To do this you would have to know upper limits, or write overflow logic for when your limits are exceeded.

Have the custom allocator grow from the middle of the buffer upwards.

Have the allocation for the unique_ptrs grow from the middle of the buffer downwards.

As long as the entire buffer fits in a cache line, you'll be fast as possible. This is not trivial to implement and your current solution may well be good enough.

like image 200
Howard Hinnant Avatar answered Oct 22 '22 13:10

Howard Hinnant


That sounds like if a plain std::vector is what you want.

It is by far the fastest iteration possible as you are doing one sweep over a continuous part of the memory. You can only do better by looking into the structure of T.

Inserting new elements can be done using push_back in O(1). Deleting a single element can be done with a combination of swap and pop_back in O(1). Deleting several elements can be done using a combination of remove_if and erase.

You can not store pointers as they will be invalidated. However, what you can do is store the index and make the vector nearly globally available.

Using indices instead of pointers has two further advantages:

  1. ints use 4 bytes whereas pointers use 8 bytes. Using pointers automatically leads to a larger memory footprint and thus to more cache misses. I doubt that you will have more than 2^32 entities so 8 bytes are overkill.
  2. You can use a structure of arrays instead of an array of structures.

To illustrate point 2 look at the following two examples

struct Unit{
    int type, health;
};
vector<Unit>units;
for(int i=0; i<(int)units.size(); ++i)
    if(units[i].type == medic)
        units[i].health += 10;

vs

vector<int>type, health;
for(int i=0; i<(int)type.size(); ++i)
    if(type[i] == medic)
        health[i] += 10;

Suppose that there are only very few medics. In the first code snippet the whole memory has to be piped through the cache as types and health values are adjacent in memory. In the second example only all types have to be loaded and the few health values of the medics. In the sum less memory has to be loaded, which leads to faster iteration.

like image 43
Someone Anonymous Avatar answered Oct 22 '22 13:10

Someone Anonymous