Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does copying a 2D array column by column take longer than row by row in C? [duplicate]

Tags:

arrays

c

#include <stdio.h>
#include <time.h>

#define N  32768

char a[N][N];
char b[N][N];

int main() {
    int i, j;

    printf("address of a[%d][%d] = %p\n", N, N, &a[N][N]);
    printf("address of b[%5d][%5d] = %p\n", 0, 0, &b[0][0]);

    clock_t start = clock();
    for (j = 0; j < N; j++)
        for (i = 0; i < N; i++)
            a[i][j] = b[i][j];
    clock_t end = clock();
    float seconds = (float)(end - start) / CLOCKS_PER_SEC;
    printf("time taken: %f secs\n", seconds);

    start = clock();
    for (i = 0; i < N; i++)
        for (j = 0; j < N; j++)
            a[i][j] = b[i][j];
    end = clock();
    seconds = (float)(end - start) / CLOCKS_PER_SEC;
    printf("time taken: %f secs\n", seconds);

    return 0;
}

Output:

address of a[32768][32768] = 0x80609080
address of b[    0][    0] = 0x601080
time taken: 18.063229 secs
time taken: 3.079248 secs

Why does column by column copying take almost 6 times as long as row by row copying? I understand that 2D array is basically an nxn size array where A[i][j] = A[i*n + j], but using simple algebra, I calculated that a Turing machine head (on main memory) would have to travel a distance of enter image description here in both the cases. Here nxn is the size of the array and x is the distance between last element of first array and first element of second array.

like image 785
Kakaji Avatar asked Dec 15 '15 23:12

Kakaji


People also ask

Why is row-major order faster than column major?

Reading memory in contiguous locations is faster than jumping around among locations. As a result, if the matrix is stored in row-major order, then iterating through its elements sequentially in row-major order may be faster than iterating through its elements in column-major order.

Which comes first in a 2D array rows or columns?

Java specifies arrays similar to that of a "row major" configuration, meaning that it indexes rows first. This is because a 2D array is an "array of arrays". The second illustration shows the "array of arrays" aspect.

Which is faster row major or column major order?

It only shows that in C language, 2-D arrays are stored in row major order and thus iterating its elements in a row major order is more efficient. In languages like Pascal and Fortran, iterating by column major order will be more efficient because 2-D arrays are stored in column major order there.

Are 2D arrays rows and columns?

The elements of a 2D array are arranged in rows and columns, and the new operator for 2D arrays specifies both the number of rows and the number of columns. For example, int[][] A; A = new int[3][4]; This creates a 2D array of int that has 12 elements arranged in 3 rows and 4 columns.


2 Answers

It pretty much comes down to this image (source):

enter image description here

When accessing data, your CPU will not only load a single value, but will also load adjacent data into the CPU's L1 cache. When iterating through your array by row, the items that have automatically been loaded into the cache are actually the ones that are processed next. However, when you are iterating by column, each time an entire "cache line" of data (the size varies per CPU) is loaded, only a single item is used and then the next line has to be loaded, effectively making the cache pointless.

The wikipedia entry and, as a high level overview, this PDF should help you understand how CPU caches work.

Edit: chqrlie in the comments is of course correct. One of the relevant factors here is that only very few of your columns fit into the L1 cache at the same time. If your rows were much smaller (say, the total size of your two dimensional array was only some kilobytes) then you might not see a performance impact from iterating per-column.

like image 102
fresskoma Avatar answered Sep 19 '22 08:09

fresskoma


While it's normal to draw the array as a rectangle, the addressing of array elements in memory is linear: 0 to one minus the number of bytes available (on nearly all machines).

Memory hierarchies (e.g. registers < L1 cache < L2 cache < RAM < swap space on disk) are optimized for the case where memory accesses are localized: accesses that are successive in time touch addresses that are close together. They are even more highly optimized (e.g. with pre-fetch strategies) for sequential access in linear order of addresses; e.g. 100,101,102...

In C, rectangular arrays are arranged in linear order by concatenating all the rows (other languages like FORTRAN and Common Lisp concatenate columns instead). Therefore the most efficient way to read or write the array is to do all the columns of the first row, then move on to the rest, row by row.

If you go down the columns instead, successive touches are N bytes apart, where N is the number of bytes in a row: 100, 10100, 20100, 30100... for the case N=10000 bytes.Then the second column is 101,10101, 20101, etc. This is the absolute worst case for most cache schemes.

In the very worst case, you can cause a page fault on each access. These days on even on an average machine it would take an enormous array to cause that. But if it happened, each touch could cost ~10ms for a head seek. Sequential access is a few nano-seconds per. That's over a factor of a million difference. Computation effectively stops in this case. It has a name: disk thrashing.

In a more normal case where only cache faults are involved, not page faults, you might see a factor of hundred. Still worth paying attention.

like image 33
Gene Avatar answered Sep 17 '22 08:09

Gene