I got one problem at my exam for subject Principal of Programming Language. I thought for long time but i still did not understand the problem
Problem: Below is a program C, that is executed in MSVC++ 6.0 environment on a PC with configuration ~ CPU Intel 1.8GHz, Ram 512MB
#define M 10000
#define N 5000
int a[M][N];
void main() {
int i, j;
time_t start, stop;
// Part A
start = time(0);
for (i = 0; i < M; i++)
for (j = 0; j < N; j++)
a[i][j] = 0;
stop = time(0);
printf("%d\n", stop - start);
// Part B
start = time(0);
for (j = 0; j < N; j++)
for (i = 0; i < M; i++)
a[i][j] = 0;
stop = time(0);
printf("%d\n", stop - start);
}
Explain why does part A only execute in 1s, but it took part B 8s to finish?
This has to do with how the array's memory is laid out and how it gets loaded into the cache and accessed: in version A, when accessing a cell of the array, the neighbors get loaded with it into the cache, and the code then immediately accesses those neighbors. In version B, one cell is accessed (and its neighbors loaded into the cache), but the next access is far away, on the next row, and so the whole cache line was loaded but only one value used, and another cache line must be filled for each access. Hence the speed difference.
Row-major order versus column-major order.
Recall first that all multi-dimensional arrays are represented in memory as a continguous block of memory. Thus the multidimensional array A(m,n) might be represented in memory as
a00 a01 a02 ... a0n a10 a11 a12 ... a1n a20 ... amn
In the first loop, you run through this block of memory sequentially. Thus, you run through the array traversing the elements in the following order
a00 a01 a02 ... a0n a10 a11 a12 ... a1n a20 ... amn
1 2 3 n n+1 n+2 n+3 ... 2n 2n+1 mn
In the second loop, you skip around in memory and run through the array traversing the elements in the following order
a00 a10 a20 ... am0 a01 a11 a21 ... am1 a02 ... amn
or, perhaps more clearly,
a00 a01 a02 ... a10 a11 a12 ... a20 ... amn
1 m+1 2m+1 2 m+2 2m+2 3 mn
All that skipping around really hurts you because you don't gain advantages from caching. When you run through the array sequentially, neighboring elements are loaded into the cache. When you skip around through the array, you don't get these benefits and instead keep getting cache misses harming performance.
Because of hardware architectural optimizations. Part A is executing operations on sequential memory addresses, which allows the hardware to substantially accelerate how the calculations are handled. Part B is basically jumping around in memory all the time, which defeats a lot of hardware optimizations.
Key concept for this specific case is processor cache.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With