Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed of C program execution

I got one problem at my exam for subject Principal of Programming Language. I thought for long time but i still did not understand the problem

Problem: Below is a program C, that is executed in MSVC++ 6.0 environment on a PC with configuration ~ CPU Intel 1.8GHz, Ram 512MB

#define M 10000
#define N 5000
int a[M][N];

void main() {
    int i, j;
    time_t start, stop;

    // Part A
    start = time(0);
    for (i = 0; i < M; i++)
        for (j = 0; j < N; j++)
            a[i][j] = 0;
    stop = time(0);
    printf("%d\n", stop - start);

    // Part B
    start = time(0);
    for (j = 0; j < N; j++)
        for (i = 0; i < M; i++)
            a[i][j] = 0;
    stop = time(0);
    printf("%d\n", stop - start);
}

Explain why does part A only execute in 1s, but it took part B 8s to finish?

like image 415
Minh Avatar asked Jul 13 '09 13:07

Minh


3 Answers

This has to do with how the array's memory is laid out and how it gets loaded into the cache and accessed: in version A, when accessing a cell of the array, the neighbors get loaded with it into the cache, and the code then immediately accesses those neighbors. In version B, one cell is accessed (and its neighbors loaded into the cache), but the next access is far away, on the next row, and so the whole cache line was loaded but only one value used, and another cache line must be filled for each access. Hence the speed difference.

like image 108
Carl Seleborg Avatar answered Sep 27 '22 23:09

Carl Seleborg


Row-major order versus column-major order.

Recall first that all multi-dimensional arrays are represented in memory as a continguous block of memory. Thus the multidimensional array A(m,n) might be represented in memory as

a00 a01 a02 ... a0n a10 a11 a12 ... a1n a20 ... amn

In the first loop, you run through this block of memory sequentially. Thus, you run through the array traversing the elements in the following order

a00  a01  a02  ...  a0n  a10  a11  a12  ...  a1n  a20 ... amn

1    2    3         n    n+1  n+2  n+3 ...   2n   2n+1    mn

In the second loop, you skip around in memory and run through the array traversing the elements in the following order

a00  a10  a20  ...  am0  a01  a11  a21  ...  am1  a02  ...  amn

or, perhaps more clearly,

a00  a01  a02  ...  a10  a11  a12  ...  a20 ... amn
1    m+1  2m+1      2    m+2  2m+2      3       mn

All that skipping around really hurts you because you don't gain advantages from caching. When you run through the array sequentially, neighboring elements are loaded into the cache. When you skip around through the array, you don't get these benefits and instead keep getting cache misses harming performance.

like image 20
jason Avatar answered Sep 27 '22 22:09

jason


Because of hardware architectural optimizations. Part A is executing operations on sequential memory addresses, which allows the hardware to substantially accelerate how the calculations are handled. Part B is basically jumping around in memory all the time, which defeats a lot of hardware optimizations.

Key concept for this specific case is processor cache.

like image 35
chaos Avatar answered Sep 27 '22 22:09

chaos