Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of breaking apart one loop into two loops

Tags:

c++

loops

big-o

Good Day,

Suppose that you have a simple for loop like below...

for(int i=0;i<10;i++)
{
    //statement 1
    //statement 2
}

Assume that statement 1 and statement 2 were O(1). Besides the small overhead of "starting" another loop, would breaking down that for loop into two (not nested, but sequential) loops be as equally fast? For example...

for(int i=0;i<10;i++)
{
    //statement 1
}
for(int i=0;i<10;i++)
{
    //statement 2
}

Why I ask such a silly question is that I have a Collision Detection System(CDS) that has to loop through all the objects. I want to "compartmentalize" the functionality of my CDS system so I can simply call

cds.update(objectlist);

instead of having to break my cds system up. (Don't worry too much about my CDS implementation... I think I know what I am doing, I just don't know how to explain it, what I really need to know is if I take a huge performance hit for looping through all my objects again.

like image 654
Matthew Avatar asked Mar 09 '12 13:03

Matthew


Video Answer


3 Answers

In terms of algorithmic complexity splitting the loops makes no difference.

In terms of real world performance splitting the loops could improve performance, worsen performance or make no difference - it depends on the OS, hardware and - of course - what statement 1 and statement 2 are.

like image 163
JoeG Avatar answered Oct 14 '22 07:10

JoeG


It depends on your application.

Possible Drawbacks (of splitting):

  • your data does not fit into the L1 data cache, therefore you load it once for the first loop and then reload it for the second loop

Possible Gains (of splitting):

  • your loop contains many variables, splitting helps reducing register/stack pressure and the optimizer turns it into better machine code
  • the functions you use trash the L1 instruction cache so the cache is loaded on each iteration, while by splitting you manage into loading it once (only) at the first iteration of each loop

These lists are certainly not comprehensive, but already you can sense that there is a tension between code and data. So it is difficult for us to take an educated/a wild guess when we know neither.

In doubt: profile. Use callgrind, check the cache misses in each case, check the number of instructions executed. Measure the time spent.

like image 41
Matthieu M. Avatar answered Oct 14 '22 05:10

Matthieu M.


With two loops you will be paying for:

  • increased generated code size
  • 2x as many branch predicts
  • depending what the data layout of statement 1 and 2 are you could be reloading data into cache.

The last point could have a huge impact in either direction. You should measure as with any perf optimization.

like image 45
Unknown1987 Avatar answered Oct 14 '22 07:10

Unknown1987