Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Threading Box2D with pthreads

So I'm essentially trying to implement an AIR Native Extension that does the physics simulation in C with interfaces through Actionscript.

I've gone through quite a few iterations which I'll list below for interest sake and I'm at what I think could be my final attempt at getting this working in a more performant way.

Ultimately I'm looking for help in how I should be setting up a threading environment for running the simulation of Box2D on a separate thread and then polling for state in AS3.

Methods:

  1. Brute Force:

In this method I simply call into C from AS3 and tell it to create a world and pass it some boxes to add to this world. Every frame in AS3, I call into C to tell the world to Step, then loop through all the bodies in the World, get their position and rotation, convert them to actionscript objects and put them in an actionscript array and then send that back to AS3. Once there I loop through the returning array and assign those position and rotation values to my sprites so they visually update.

The results are actually quite decent with about 116 boxes being added before the framerate suffers. This is compared to 30 boxes in a pure AS3 implementation. Note that these stats are in Debug mode. In release mode, they both make it to about 120 boxes. There is little difference between the AS3 implementation and the Native Extension implementation.

  1. ByteArray Sharing

In order to improve performance I decided it would be a good idea to try and limit the amount of data being marshalled across C and AS3. ANE's support sharing a byte array's memory space and so I would send the ByteArray created in AS3 to C and have C simply update the ByteArray. This saves us from having to construct AS3 objects in C and pass them back. Every frame, AS3 simply needs to iterate through it's ByteArray and see what C has written into it and then assign those values to the sprites to set the visual state.

The results here are sadly about the same. Improvements are only marginal.

  1. Direct Object Setting From C

Another thing ANE's are capable of is setting the property of an object that lives in AS3. In this sense I aimed to eliminate the overhead of passing back data to AS3, the looping through the bodies to collect data in C and the looping through in AS3 to assign the values. I directly modified the Box2D code so that when it's values were changed it would write the new x, y, rotation values directly on the corresponding Sprite.

The results are amazing at very low amounts of objects since the call to set these properties is well under a millisecond. The problem is that this scales linearly and around 90 or so objects, the overhead is too severe and things start to slow down.

  1. Threading

At this point I was a bit stumped. There's overhead in marshalling data, there's a cost in C for iterating and constructing the data to return and there's a cost in AS3 for iterating to assign values to the sprites.

Obviously there needs to be a trade-off so my current solution is the best I can come up with for now.

On the AS3 side you call into C to create your world, call in to add a box to that world, and call in to tell C you want a refresh of your data. When boxes are created in AS3 they get a unique id and they are stored in a dictionary with the key being the id.

On the C side, the world is created and a new pthread is spawned to do the Step. Essentially simulating the world on another thread. After it steps, it assembles all the data and writes it into a double array. Then it does so again and again and again. It just simulates forever basically on it's own thread.

When we call in to C to add a new box, I need to create a new box and add it to that world. Since the world is Stepping this could cause problems which means I need to use mutexes I'm pretty sure.

Same thing when we call to get the values refreshed in AIR, I'll want to do a memcpy from the array of doubles into my AS3 bytearray and then loop through the bytearray to set the values on the visual.

The mutexes were giving me trouble so I basically implemented my own which you can see below... and laugh at :)

However it does work, just not as fast as I would like it too. Around 90 we slow down again.

Anyone have any thoughts or pointers? It'd be greatly appreciated!

C Code

The parser was acting up so i've pasted it here: http://pastebin.com/eBQGuGJX

AS3 Code

Same thing with the parser. I've only included the relevant method dealing with every frame in AS3. http://pastebin.com/R1Qs2Tyt

like image 806
Jon Avatar asked Oct 24 '22 07:10

Jon


1 Answers

I had forgotten I had this question. Fortunately I have figured it out.

The idea of using mutexes etc was over engineered in the first place and unnecessary.

Since we're running in Flash, everything runs in the main thread. Which means for each "frame" flash will natively handle any media, then our client code which we have written, then actually render to the screen and finally do any garbage collection if necessary.

I don't actually need to have the physics sim simulating forever, I simply need to have it be one step ahead of my client code.

So what happens now is when the Client calls into the ANE to setup the world, it creates a new thread that simulates the world and returns immediately back to Flash. Flash will continue to do its work of executing the rest of the client code and then rendering and then GC.

Then on each frame in Flash we can simply call into the ANE to retrieve the results. In the case that the Simulation thread wasn't finished we wait via a join, extract the values and return them to Flash. Making sure to spawn another thread for the next step before returning of course.

In this way we are maximizing our efficiency since the simulation is happening while Flash is busy doing other things we don't have control over (like rendering and GC).

The good news is that performance almost doubles with this approach. Going from approx 90 boxes in a synchronous pure AS3 implementation to approx 170 boxes in a threaded ANE approach.

The bottleneck eventually becomes the iteration through the data coming back from the ANE and assigning those values to the Display Objects.

I hope this helps someone else who was looking for something similar. I'll be giving a talk about it at FITC Toronto at the end of April so there may be more information and material I can post then.

http://www.fitc.ca/events/presentations/presentation.cfm?event=124&presentation_id=1973

like image 124
Jon Avatar answered Oct 26 '22 22:10

Jon