I have a matrix of int
s named A
, and when I'm iterating trough it by columns instead of rows, it runs about 50 ms slower:
for(int i=0;i<n;i++)
for(int j=0;j<n;j++)
cout<<A[j][i]; //slower than of A[i][j]
Does anyone know why does this happen? I've asked a couple of people, but none of them knew why. I'm sure it's related to how the addresses are represented in computer's memory, but still, I'd like to find a more concrete answer.
Iterating through the matrix row by row is faster because of cache memory.
When you access A[i][j]
, there is more memory being loaded into the cache than just one element. Note that each row of your matrix is stored within continuous memory block, thus when the memory "around" A[i][j]
is still in cache, it is more probable that accessing next element within the same row will result in it being read from cache rather than main memory (see cache miss).
Also see related questions:
Why does the order of the loops affect performance when iterating over a 2D array?
Which of these two for loops is more efficient in terms of time and cache performance
How cache memory works?
Matrix multiplication: Small difference in matrix size, large difference in timings
A 2D array is stored in memory as a 1D array, in (row/column) major. What this means is that an array with 5 columns, might be stored as 5 columns one after the other, so based on how you access vs this ordering, your accesses might be cached, or every one of them might cause cache fails, causing the large difference in performance.
It's about cache line read mechanism. Read about spatial locality.
To verify, try to disable cache while running this application. (I forgot how to do this, but it can be done.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With