Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cost of a virtual function in a tight loop

I am in a situation where I have game objects that have a virtual function Update(). There are a lot of game objects (currently a little over 7000) and the loop calls update for all of them (amongst other things). My colleague suggested that we should remove the virtual function altogether. As you can imagine, this will take a lot of refactoring.

I have seen this answer but in my case, profiling means I have to change a lot of code. So before I even think of starting I thought I'd ask here for opinion on whether refactoring is worth it in this case.

Note that I have profiled other parts of the loop and have been trying to optimize the parts that are taking the longest. I suspect that the virtual function calls in this case is something I should not worry about, but I cannot be sure until I profile and I cannot profile until I change the code (which is a lot). Also note that some update functions are very small while others are larger more complex.

EDIT: There are multiple answers that give great insight, so anybody who stumbles onto this question in the future, have a look at all the answers and not just the selected one.

like image 913
Samaursa Avatar asked Jul 06 '11 15:07

Samaursa


3 Answers

A virtual function call is not going to add much more than a single indirection and a hard-to-predict jump. That means that usually you're down one pipeline flush or about 20 cycles per virtual function. 7000 of them is about 140000 cycles, which should be negligible compared to your average update function. If it isn't, say that most of your update functions are just empty, you can consider putting the update-able objects in a separate list for this purpose.

Removing the virtual function is just going to lead to one of you replacing it with an identical but self-implemented system. This is the exact kind of place where a virtual function makes sense.

Per reference, 140000 cycles is about 50 microseconds. That's assuming a P4 with a huge pipeline and always a full pipeline flush (which you don't usually get).

like image 57
dascandy Avatar answered Nov 01 '22 14:11

dascandy


Although it's not the same code and may not be the same compiler as you're using, here's a bit of reference data from a rather old benchmark (bench++ by Joe Orost):

Test Name:   F000005                         Class Name:  Style
CPU Time:        7.70  nanoseconds           plus or minus      0.385
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way if/else if statement
 compare this test with F000006


Test Name:   F000006                         Class Name:  Style
CPU Time:        2.00  nanoseconds           plus or minus     0.0999
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way switch statement
 compare this test with F000005


Test Name:   F000007                         Class Name:  Style
CPU Time:        3.41  nanoseconds           plus or minus      0.171
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way sparse switch statement
 compare this test with F000005 and F000006


Test Name:   F000008                         Class Name:  Style
CPU Time:        2.20  nanoseconds           plus or minus      0.110
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way virtual function class
 compare this test with F000006

This particular result is from compiling with the 64-bit edition of VC++ 9.0 (VS 2008), but it's reasonably similar to what I've seen from other recent compilers. The bottom line is that the virtual function is faster than most of the obvious alternatives, and very close to the same speed as the only one that beats it (in fact, the two being equal is within the measured margin of error). That, however, depends on the values involved being dense -- as you can see in F00007, if the values are sparse, the switch statement produces code that's slower than the virtual function call.

Bottom line: The virtual function call is probably the wrong place to look. Refactored code might easily work out slower, and even at best it probably won't gain enough to notice or care about.

like image 20
Jerry Coffin Avatar answered Nov 01 '22 13:11

Jerry Coffin


If you can't profile, have a look at the assembler code to get an idea how expensive the lookup really is. It might be a simple indirect jump which costs almost nothing.

If you need to refactor, here is a suggestion: Create lots of "UpdateXxx" classes which know how to call the new non-virtual update() method. Collect those in an array and then call update() on them.

But my guess is that you won't save much, especially not with only 7K objects.

Note on profiling: If you can't use a profiler (makes me wonder why not), time the calls to update() and log calls which take longer than, say, 100ms. The timing isn't expensive and it allows you to quickly figure out which calls are most expensive.

like image 26
Aaron Digulla Avatar answered Nov 01 '22 15:11

Aaron Digulla