Here is simple C++ code that compare iterating 2D array row major with column major.
#include <iostream>
#include <ctime>
using namespace std;
const int d = 10000;
int** A = new int* [d];
int main(int argc, const char * argv[]) {
for(int i = 0; i < d; ++i)
A[i] = new int [d];
clock_t ColMajor = clock();
for(int b = 0; b < d; ++b)
for(int a = 0; a < d; ++a)
A[a][b]++;
double col = static_cast<double>(clock() - ColMajor) / CLOCKS_PER_SEC;
clock_t RowMajor = clock();
for(int a = 0; a < d; ++a)
for(int b = 0; b < d; ++b)
A[a][b]++;
double row = static_cast<double>(clock() - RowMajor) / CLOCKS_PER_SEC;
cout << "Row Major : " << row;
cout << "\nColumn Major : " << col;
return 0;
}
Result for different values of d:
d = 10^3 :
Row Major : 0.002431
Column Major : 0.017186
d = 10^4 :
Row Major : 0.237995
Column Major : 2.04471
d = 10^5
Row Major : 53.9561
Column Major : 444.339
Now the question is why row major is faster than column major?
It only shows that in C language, 2-D arrays are stored in row major order and thus iterating its elements in a row major order is more efficient. In languages like Pascal and Fortran, iterating by column major order will be more efficient because 2-D arrays are stored in column major order there.
The difference between the orders lies in which elements of an array are contiguous in memory. In row-major order, the consecutive elements of a row reside next to each other, whereas the same holds true for consecutive elements of a column in column-major order.
Accessing a column-major array in the column-row order will be efficient because the array is stored sequentially in this manner (and the CPU pre-fetches data required next).
On the exam assume that any 2 dimensional (2D) array is in row-major order. The outer array can be thought of as the rows and the inner arrays the columns.
It obviously depends on the machine you're on but very generally speaking:
Your computer stores parts of your program's memory in a cache that has a much smaller latency than main memory (even when compensating for cache hit time).
C arrays are stored in a contiguous by row major order. This means if you ask for element x
, then element x+1
is stored in main memory at a location directly following where x
is stored.
It's typical for your computer cache to "pre-emptively" fill cache with memory addresses that haven't been used yet, but that are locally close to memory that your program has used already. Think of your computer as saying: "well, you wanted memory at address X so I am going to assume that you will shortly want memory at X+1, therefore I will pre-emptively grab that for you and place it in your cache".
When you enumerate your array via row major order, you're enumerating it in such a way where it's stored in a contiguous manner in memory, and your machine has already taken the liberty of pre-loading those addresses into cache for you because it guessed that you wanted it. Therefore you achieve a higher rate of cache hits. When you're enumerating an array in another non-contiguous manner then your machine likely won't predict the memory access pattern you're applying, so it wont be able to pre-emptively pull memory addresses into cache for you, and you won't incur as many cache hits, so main memory will have to be accessed more frequently which is slower than your cache.
Also, this might be better suited for https://cs.stackexchange.com/ because the way your system cache behaves is implemented in hardware, and spatial locality questions seem better suited there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With