Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it preferred to access the first dimension first than to access the second dimension of a 2 dimension array?

Tags:

c++

c

Here is the code,

int array[X][Y] = {0,};

// 1 way to access the data
for (int x = 0; x < X; x++)
  for(int y = 0; y < Y; y++)
    array[x][y] = compute();

// the other way to access the data
for (int y = 0; y < Y; y++)
  for (int x = 0; x < X; x++)
    array[x][y] = compute();

Is it true that the first way is more efficient than the second one since the CPU cache(L1, L2?) optimization? In other words, whether sequential access pattern is preferred even for RAM?

like image 587
Thomson Avatar asked Dec 22 '22 22:12

Thomson


1 Answers

You'll understand this better if you draw a picture of your array in memory:

  Y ->
X xxxxx ...
| xxxxx
v xxxxx
  .
  .

The adress you access will grow linear in Y direction (345, 345+1, 345+2...), but jumps wildly in X direction if Y is large (345, 345+X, 345+X*2). As the cache loads blocks of memory, you'll jump out of them very soon with Y big enough, but will always be in the cache page when walking in Y direction until the cache has to be refreshed.

Also note that this effect can be more extreme when using dynamical allocation. Using the following program with full optimizations gives me the following output (times in seconds)

0.615000
9.878000

EDIT: Other interesting measures:

Replacing the array code with int array[X][Y]; will use stack memory which is limited, so you can't test much bigger X/Y values, but also very fast:

0.000000
0.000000

Using int array[X][Y]; as a global variable will use a block of heap memory and is slow again. So even without dynamical allocation, the first case is much better:

0.929000
8.944000

Using X=1500, Y=1500 shows that the effect is measurable even with smaller arrays:

0.008000
0.059000

EDIT2: Also note there are other possible optimizations to the code as jalf said in a comment to your question. Using this optimization indeed almost doubles the speed (0.453 seconds for X=Y=10000):

// an even faster way to access the array
for (int x = 0; x < X; x++) {
  int* arrayptr = array[x];
  for (int y = 0; y < Y; y++, arrayptr++)
    *arrayptr = x;
}

Code: (note you can also use this to measure your case where the difference shouldn't be that extreme except for big X and Y. Like others already said, measure this and you'll be enlightened).

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define X 10000
#define Y 10000

int main() {

  int** array = new int*[X];

  for (int x = 0; x < X; x++) {
    array[x] = new int[Y];
  }

  double c = clock();  

  // 1 way to access the data
  for (int x = 0; x < X; x++)
    for(int y = 0; y < Y; y++)
      array[x][y] = x;

  printf("%f\n", (clock() - c) / CLOCKS_PER_SEC);

  c = clock();  

  // the other way to access the data
  for (int y = 0; y < Y; y++)
    for (int x = 0; x < X; x++)
      array[x][y] = x;

  printf("%f\n", (clock() - c) / CLOCKS_PER_SEC);

  for (int x = 0; x < X; x++) {
    delete(array[x]);
  }
  delete(array);
}
like image 188
schnaader Avatar answered May 14 '23 07:05

schnaader