I'am writing a C program to calculate the product of two matrix. The problem That I noticed that the order of for loops does matter. For example:
for N=500
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
for (int k = 0 ; k < N; ++k) {
C[i*N+j]+=A[i*N+k] * B[k*N+j];
}
}
}
execution time (Seconds) : 1.1531820000
for (int j = 0; j < N; ++j) {
for (int k = 0 ; k < N; ++k) {
for (int i = 0; i < N; ++i) {
C[i*N+j]+=A[i*N+k] * B[k*N+j];
}
}
}
execution time (Seconds) : 2.6801300000
Matrix declaration:
A=(double*)malloc(sizeof(double)*N*N);
B=(double*)malloc(sizeof(double)*N*N);
C=(double*)malloc(sizeof(double)*N*N);
I run the test for 5 time than calculate the average. Anyone have an idea why is this happening?
With the second loop, you keep making many big jumps all the time when you increment i in the inner loop, and to a lesser extent k. The cache is probably not very happy with that. The first loop is better, indeed it's even better if you invert the orders of j and k.
This is essentially a problem of data locality. Accesses to main memory are very slow on modern architectures, so your CPU will keep caches of recently accessed memory and try to prefetch memory that is likely to be accessed next. Those caches are very efficient at speeding up accesses that are grouped in the same small area, or accesses that follow a predictable pattern.
Here we turned a pattern where the CPU would make big jumps through memory and then come back into a nice mostly sequential pattern, hence the speedup.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With