I am working on optimizing a physics simulation program using Red Gate's Performance Profiler. One part of the code dealing with collision detection had around 52 of the following little checks, dealing with cells in 26 directions in 3 dimensions, under two cases.
CollisionPrimitiveList cell = innerGrid[cellIndex + 1];
if (cell.Count > 0)
contactsMade += collideWithCell(obj, cell, data, ref attemptedContacts);
cell = innerGrid[cellIndex + grid.XExtent];
if (cell.Count > 0)
contactsMade += collideWithCell(obj, cell, data, ref attemptedContacts);
cell = innerGrid[cellIndex + grid.XzLayerSize];
if (cell.Count > 0)
contactsMade += collideWithCell(obj, cell, data, ref attemptedContacts);
As an extremely tight loop of the program, all of this had to be in the same method, but I found that, suddenly, after I had extended the area from two dimensions to three dimensions (rising the count to 52 checks from 16), suddenly cell.Count was no longer being inlined, even though it is a simple getter.
public int Count { get { return count; } }
This caused a humongous performance hit, and it took me a considerable time to find that, when cell.Count appeared in the method 28 times or less, it was inlined every time, but once cell.Count appeared in the method 29 times or more, it was not inlined a single time (even though the vast majority of calls were from worst-case scenario parts of the code that were rarely executed.)
So back to my question, does anybody have any idea to get around this limit? I think the easy solution is just to make the count field internal and not private, but I would like a better solution than this, or at least just a better understanding of the situation. I wish this sort of thing would have been mentioned on Microsoft's Writing High-Performance Managed Applications page at http://msdn.microsoft.com/en-us/library/ms973858.aspx but sadly it is not (possibly because of how arbitrary the 28 count limit is?)
I am using .NET 4.0.
EDIT: It looks like I misinterpreted my little testing. I found that the failure to inline was caused not by the methods themselves being called some 28+ times, but because the the method they ought to be inlined into is "too long" by some standard. This still confuses me, because I don't see how a simple getter could be rationally not inlined (and performance is significantly better with them inlined as my profiler clearly shows me), but apparently the CLI JIT compiler is refusing to inline anything just because the method is already large (playing around with slight variations showed me that this limit is a code size (from idasm) of 1500, above which no inlining is done, even in the case of my getters, which some testing showed add no additional code overhead to be inlined).
Thank you.
I haven't tested this, but it seems like one possible workaround is to have multiple properties that all return the same thing. Conceivably you could then get 28 inlines per property.
Note that the number of times a method is inlined most likely depends on the size of native code for that method (See http://blogs.msdn.com/b/vancem/archive/2008/08/19/to-inline-or-not-to-inline-that-is-the-question.aspx), the the number 28 is specific to that one property. A simple property would likely get inlined more times than a more complex method.
Straight off, this doesn't explain why 28 is the magic number, but I'm curious what would happen if you collate all your candidate CollisionListPrimitive instances into an array, and then call your "if count > 0" block within a loop of the array?
Is the cell.Count call then made inline again?
e.g.
CollisionPrimitiveList[] cells = new CollisionPrimitiveList {
innerGrid[cellIndex + 1],
innerGrid[cellIndex + grid.XExtent],
innerGrid[cellIndex + grid.XzLayerSize]
// and all the rest
};
// Loop over cells - for demo only. Use for loop or LINQ'ify if faster
foreach (CollisionPrimitiveList cell in cells)
{
if (cell.Count > 0)
contactsMade += collideWithCell(obj, cell, data, ref attemptedContacts);
}
I know performance is the issue, and you'll have overheads constructing the array and looping through it, but if cell.Count is inline again, might the performance still be better / good enough overall?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With