I'm currently developing a 2D game with C#/XNA. The game's core feature are bullets with vastly different behaviors (it will be kind of a bullet hell game). Updating all the bullets can take quite some time since they can be infinitely complex with their behaviors. And all of them have to do 1 collision check. Originally I just stored them in a List and updated and drew all of them, removing inactive bullets from the list each frame. This however quickly proved to slow the game down when there where 8k bullets on the screen, so I decided to implement multithreading and using LINQ to help performance.
Thing is it's still slowing down at around 16k bullets. I was told I could achieve up to 7 MILLION active bullets if I did it right, so I'm not satisfied with 16k...
Is there anything else I could do to improve performance here?
Additional information before code: My bullets have fields for velocity, direction, angular velocity, acceleration, a speedlimit and a Behavior. The only special thing as mentioned is the behavior. It can modify any of the bullets fields at any time or spawn more bullets and even plant itself in them, therefore I'm having a hard time applying a Data-Driven Solution and just storing all these fields in arrays instead of having a list of bullets.
internal class BulletManager : GameComponent
{
public static float CurrentDrawDepth = .82f;
private readonly List<Bullet> _bullets = new List<Bullet>();
private readonly int _processorCount;
private int _counter;
private readonly Task[] _tasks;
public BulletManager(Game game)
: base(game)
{
_processorCount = VariableProvider.ProcessorCount;
_tasks = new Task[_processorCount];
}
public void ClearAllBullets()
{
_bullets.Clear();
}
public void AddBullet(Bullet bullet)
{
_bullets.Add(bullet);
}
public override void Update(GameTime gameTime)
{
if (StateManager.GameState != GameStates.Ingame &&
(StateManager.GameState != GameStates.Editor || EngineStates.GameStates != EEngineStates.Running))
return;
var bulletCount = _bullets.Count;
var bulletsToProcess = bulletCount / _processorCount;
//Split up the bullets to update among all available cores using Tasks and a lambda expression
for (var i = 0; i < _processorCount; ++i )
{
var x = i;
_tasks[i] = Task.Factory.StartNew( () =>
{
for(var j = bulletsToProcess * x; j < bulletsToProcess * x + bulletsToProcess; ++j)
{
if (_bullets[j].Active)
_bullets[j].Update();
}
});
}
//Update the remaining bullets (if any)
for (var i = bulletsToProcess * _processorCount; i < bulletCount; ++i)
{
if (_bullets[i].Active)
_bullets[i].Update();
}
//Wait for all tasks to finish
Task.WaitAll(_tasks);
//This is an attempt to reduce the load per frame, originally _bullets.RemoveAll(s => !s.Active) ran every frame.
++_counter;
if (_counter != 300) return;
_counter = 0;
_bullets.RemoveAll(s => !s.Active);
}
public void Draw(SpriteBatch spriteBatch)
{
if (StateManager.GameState != GameStates.Ingame && StateManager.GameState != GameStates.Editor) return;
spriteBatch.DrawString(FontProvider.GetFont("Mono14"), _bullets.Count.ToString(), new Vector2(100, 20),
Color.White);
//Using some LINQ to only draw bullets in the viewport
foreach (var bullet in _bullets.Where(bullet => Camera.ViewPort.Contains(bullet.CircleCollisionCenter.ToPoint())))
{
bullet.Draw(spriteBatch);
CurrentDrawDepth -= .82e-5f;
}
CurrentDrawDepth = .82f;
}
}
Wow. There is a lot wrong with that code you posted (and possibly the code you didn't post). Here is what you need to do to improve performance, roughly in descending order of importance/necessity:
Measure performance. At the very basic level a frame-rate counter (or, better yet, a frame-time counter). You want to check that you are making things better.
Don't allocate memory during your game loop. The best way to check if you are is to use the CLR Profiler. While you may not be using new
(to allocate class
types, structs
are ok), it would not surprise me if much of that LINQ is allocating memory behind-the-scenes.
Note that ToString
will allocate memory. There are allocation-free ways (using StringBuilder
) to draw numbers if you need them.
This article gives more info.
Don't use LINQ. LINQ is easy and convenient and absolutely not the fastest or most memory-efficient way to manipulate collections.
Use a data-driven approach. The key idea behind a data-driven approach is that you maintain cache-coherency (more info). That is: all your Bullet
data is stored linearly in memory. To do this, make sure Bullet
is a struct
and you store them in a List<Bullet>
. This means that when one Bullet
gets loaded into the CPU cache, it brings others along with it (memory is loaded into cache in large blocks), reducing the time the CPU spends waiting for memory to load.
To remove bullets quickly, overwrite the one you are removing with the last bullet in the list, then remove the last element. This allows you to remove elements without copying most of the list.
Use SpriteBatch
with a mind for performance. Do a separate batch of sprites (Begin()/End()
block) for your bullets. Use SpriteSortMode.Deferred
- it is by far the fastest mode. Doing a sort (as implied by your CurrentDrawDepth
) is slow! Ensure all your bullets are using the same texture (use a texture atlas if necessary). Remember that batching is only a performance improvement if consecutive sprites share a texture. (More info)
If you are using SpriteBatch
well, then it will probably be faster to draw all of your sprites and then let the GPU cull them if they are off-screen.
(Optional) Maintain a different list for each behaviour. This reduces the amount of branching in your code and could potentially make the code itself (ie: instructions, not data) more cache-coherent. Unlike the above points, this will only give a small performance improvement, so only implement it if you need to.
(NOTE: Beyond this point these changes are difficult to implement, will make your code harder to read, and could even make it slower. Only implement these if absolutely necessary and you are measuring performance.)
(Optional) Inline your code. Once you start getting into the many-thousands of bullets, you may need to inline your code (remove method calls) to squeeze out even more performance. The C# compiler doesn't inline, and the JIT only does it a little bit, so you need to inline by-hand. Method calls includes things like the +
and *
operators you might use on vectors - inlining these will improve performance.
(Optional) Use a custom shader. If you want even more performance than simply using SpriteBatch
, write a custom shader that takes your Bullet
data and calculates as much as possible on the GPU.
(Optional) Make your data even smaller and (if possible) immutable. Store your initial conditions (position, direction, time-stamp) in your Bullet
struct. Then use the basic equations of motion to calculate the current position/velocity/etc only as you need them. You can often get these calculations for "free" - as you probably have unused CPU time while it is waiting for memory.
If your data is immutable, then you can avoid transferring it onto the GPU each frame! (If you are adding/removing bullets, you'll have to update it on the GPU on those frames, though).
If you implemented all of these items, I imagine you probably could get up to 7 million bullets on a good machine. Although that would probably leave little CPU time left over for the rest of your game.
Why are you removing inactive bullets?
I think this type of thing is often times solved by the concept of having a "pool" - maybe I'm missing something from your code, but it seems like you already have a concept of active, so why remove an inactive one to then create A NEW bullet which will at some point be removed again to be handled by GC. Just reuse an inactive bullet.
Also, I can't tell you how much it hurts, but the usage of ToString() in your draw 30 times a second generates garbage to be cleaned up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With