I was going through loops and found a significant difference in accessing loops. I can't understand what is the thing that causes such difference in both cases?
First Example:
Execution Time; 8 seconds
for (int kk = 0; kk < 1000; kk++) { sum = 0; for (int i = 0; i < 1024; i++) for (int j = 0; j < 1024; j++) { sum += matrix[i][j]; } }
Second Example:
Execution Time: 23 seconds
for (int kk = 0; kk < 1000; kk++) { sum = 0; for (int i = 0; i < 1024; i++) for (int j = 0; j < 1024; j++) { sum += matrix[j][i]; } }
What causes so much execution time difference just exchanging
matrix[i][j]
to
matrix[j][i]
?
The main difference between the for 's and the while 's is a matter of pragmatics: we usually use for when there is a known number of iterations, and use while constructs when the number of iterations in not known in advance.
The main reason that While is much slower is because the while loop checks the condition after each iteration, so if you are going to write this code, just use a for loop instead.
Use a for loop when you know the loop should execute n times. Use a while loop for reading a file into a variable. Use a while loop when asking for user input. Use a while loop when the increment value is nonstandard.
In general, you should use a for loop when you know how many times the loop should run. If you want the loop to break based on a condition other than the number of times it runs, you should use a while loop.
It's an issue of memory cache.
matrix[i][j]
has better cache hits than matrix[j][i]
, since matrix[i][j]
has more continuous memory accessing chances.
For example, when we access matrix[i][0]
, the cache may load a continuous segment of memory containing matrix[i][0]
, thus, accessing matrix[i][1]
, matrix[i][2]
, ..., will benefit from caching speed, since matrix[i][1]
, matrix[i][2]
, ... are near to matrix[i][0]
.
However, when we access matrix[j][0]
, it is far from matrix[j - 1][0]
and may not been cached, and can not benefit from caching speed. Especially, a matrix is normally stored as a continuous big segment of memory, and the cacher may predicate the behavior of memory accessing and always cache the memory.
That's why matrix[i][j]
is faster. This is typical in CPU cache based performance optimizing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With