I just read this post, and wonder if we can draw the conclusion that a big loop within a small loop must always run faster than a small loop within a big one, no matter what the code does inside the nested loop? Take an example.
int m, n;
m = 1000000;
n = 10;
Snippet A
for (int i = 0; i < n; i++)
for (int j=0; j < m; j++)
{
DoSomething();
}
Snippet B
for (int j = 0; j < m; j++)
for (int i=0; i < n; i++)
{
DoSomething();
}
Can we say that, no matter what DoSomething() actually does, snippet A always runs faster thant snippet B?
As pointed out by @stackmate, I want to expand this question into two
When the code inside nested loop is DoSomething() which means DoSomething() has nothing to do with variable i and j. What is the performance difference?
When the code inside nested loop is DoSomething(i, j) which means DoSomething(i, j) has relateship with variable i and j. What is the performance difference?
There cannot be a specific answer to your question. The parameter deciding whether it will be fast or not is what you are doing inside the loops. For example say you are adding 2 arrays and storing them in a third array:
Code 1:
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 1000000; j++)
C[i][j] = A[i][j] + B[i][j];
}
Code 2:
for(int i = 0; i < 1000000; i++)
{
for(int j = 0; j < 1000; j++)
C[j][i] = A[j][i] + B[j][i];
}
Code 1 will be much faster than code 2. The reason is cache. Take a look at this question for more details. The answers are superbly informative and there is no point in me explaining the concept of cache again over here.
@Cool_Coder already covered one main reason (memory access patterns resulting in better cache hit rates) why having the smaller loop as the inside loop can be beneficial.
Another scenario is that the inner loop could be unrolled. Particularly if the size of the smaller loop is really small and fixed, the compiler will unroll the inner loop if it's beneficial. The resulting code will then have only one loop instead of two nested loops, with a reduction in branches.
If you have a situation like this in highly performance critical code, you need to try both, and benchmark carefully. If the code is not very performance critical, it's probably not worth worrying about.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With