Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with object deletion in continuous allocation?

I've recently found benefits of Data Oriented Design. It looks very impressive. One of points is grouping data by type and access, not all together in objects, but in arrays, to prevent cache misses and for better processing.

So in game we still have instances and user is able to destroy any of them (not only last in array). I can't figure out how to effectively deal with object deletion in middle of array.

I have one idea: to have isAlive value, but that will will cause quite big impact on number of conditions, because every objects get checked many times in processing, drawing,...

Another idea is to shift whole array to fill space that has to be deleted, but this will consume much resources at deletion.

How can man deal with this in DOD?

So put up the requirements:

  • it has to be array(s) in order to reduce cache misses destribed in DOD
  • it has to have fast random position object deletion, max o(log n)
  • objects can't move since they were created, because they could be referenced at unknown places, so it will cause program misbehavior
like image 993
kravemir Avatar asked Oct 06 '12 19:10

kravemir


1 Answers

It's actually quite easy, don't use direct references. Use a layer of indirection such as IDs. For example:

Let's say you have a FooManager that manages all your Foo "objects" (not objects in the OOP sense, a collection of arrays for each Foo property). As I understand it, what you do right now is just return an index. Let's say Foo #42 is the Foo which data is located at index 42 of all the arrays. Now you want to delete Foo #42, which blows a hole in your array. You could move all other array entries but then Foo #43 no longer points at the actual index 43.

So we add an ID table and instead of returning the index, you return an ID. When you create a new Foo, you add its data to the first free index in the arrays (let's say 42). Then you generate an new unused ID (let's say 1337). Now you can update the ID table (a (hash)map is great for this) to say ID 1337 points to index 42. Now you can return the ID 1337 to the calling function. (Notice how the calling function never finds out where the data is actually stored, that's irrelevant information)

Next time the Foo needs to be updated from another piece of code, the ID is used. So the FooManager's setBar function is called with ID 1337 and the new Bar value as arguments. FooManager looks up 1337 in its ID table, finds out it's still located at index 42 and updates the Bar array index 42 with the new Bar value.

Now this is where this system gets it's value, let's remove Foo 1337. FooManager's removeFoo is called with ID 1337 as argument. It looks up 1337, it's located at index 42. However, in the mean time, 10 new Foos have been added. So what we can now do is just replace the values at index 42 with the values at 52, effectively moving 52 to 42. This would cause a big problem in the old system, but now, we only need to update the index table. So we look up which ID points to 52 (let's say it's 1401) and update it to 42. Next time someone tries to update the Foo with ID 1401, it looks it up in the index table and finds it is located at index 42.

So we've kept the data in continuous memory, a remove only costs a very low number of data copies operations (one copy for each property of Foo) and the code "outside" of FooManager never even realizes something happened. It even solves dead refence issues. Suppose some code still has the deleted 1337 ID (dangling refence, this is bad!), when it tries to access/change it, FooManager looks it up, sees 1337 doesn't exist (any more) and can generate a nice clean warning/error and callstack, which allows you to directly identify which code still has the dangling reference!

There is only one disadvantage, which is an extra lookup in the ID table. Now a hash table can be really fast, but it's still an extra operation for each time you modify a Foo object. However, in most cases, access from outside of a manager happens far less than access inside of a manager. When you have a BulletManager, it will update each bullet every frame, but accessing a Bullet to change/request something and calls to create/remove bullets are faaaar less likely to occur. If this is the other way around though, you should probably update your data structures to optimize for that situation. Then again, in such a situation, it doesn't matter that much any more where data is located in memory so you could live with either "holes" in your arrays or even using completely different data layouts.

like image 195
Mart Avatar answered Oct 18 '22 15:10

Mart