Why for loop order effect running time in matrix multiplication?

Question

I'am writing a C program to calculate the product of two matrix. The problem That I noticed that the order of for loops does matter. For example:

for N=500

  for (int i = 0; i < N; ++i) {
    for (int j = 0; j < N; ++j) {
      for (int k = 0 ; k < N; ++k) {
       C[i*N+j]+=A[i*N+k] * B[k*N+j];
       }
     }
   }

execution time (Seconds) : 1.1531820000

   for (int j = 0; j < N; ++j) {
     for (int k = 0 ; k < N; ++k) {
       for (int i = 0; i < N; ++i) {
       C[i*N+j]+=A[i*N+k] * B[k*N+j];
       }
     }
   }

execution time (Seconds) : 2.6801300000

Matrix declaration:

      A=(double*)malloc(sizeof(double)*N*N);
      B=(double*)malloc(sizeof(double)*N*N);
      C=(double*)malloc(sizeof(double)*N*N);

I run the test for 5 time than calculate the average. Anyone have an idea why is this happening?

tux3 · Accepted Answer

With the second loop, you keep making many big jumps all the time when you increment i in the inner loop, and to a lesser extent k. The cache is probably not very happy with that. The first loop is better, indeed it's even better if you invert the orders of j and k.

This is essentially a problem of data locality. Accesses to main memory are very slow on modern architectures, so your CPU will keep caches of recently accessed memory and try to prefetch memory that is likely to be accessed next. Those caches are very efficient at speeding up accesses that are grouped in the same small area, or accesses that follow a predictable pattern.

Here we turned a pattern where the CPU would make big jumps through memory and then come back into a nice mostly sequential pattern, hence the speedup.

Why for loop order effect running time in matrix multiplication?

Tags:

c

for-loop

matrix

Chaker

1 Answers

tux3

Recent Activity

Donate For Us

Why for loop order effect running time in matrix multiplication?

Tags:

c

for-loop

matrix

Chaker

1 Answers

tux3

Related questions

Recent Activity

Donate For Us