Why does the order of the loops affect performance when iterating over a 2D array?

Tags:

Below are two programs that are almost identical except that I switched the i and j variables around. They both run in different amounts of time. Could someone explain why this happens?

Version 1

#include <stdio.h> #include <stdlib.h>  main () {   int i,j;   static int x[4000][4000];   for (i = 0; i < 4000; i++) {     for (j = 0; j < 4000; j++) {       x[j][i] = i + j; }   } }

Version 2

#include <stdio.h> #include <stdlib.h>  main () {   int i,j;   static int x[4000][4000];   for (j = 0; j < 4000; j++) {      for (i = 0; i < 4000; i++) {        x[j][i] = i + j; }    } }

912

asked Mar 30 '12 02:03

Mark

2 Answers

As others have said, the issue is the store to the memory location in the array: x[i][j]. Here's a bit of insight why:

You have a 2-dimensional array, but memory in the computer is inherently 1-dimensional. So while you imagine your array like this:

0,0 | 0,1 | 0,2 | 0,3 ----+-----+-----+---- 1,0 | 1,1 | 1,2 | 1,3 ----+-----+-----+---- 2,0 | 2,1 | 2,2 | 2,3

Your computer stores it in memory as a single line:

0,0 | 0,1 | 0,2 | 0,3 | 1,0 | 1,1 | 1,2 | 1,3 | 2,0 | 2,1 | 2,2 | 2,3

In the 2nd example, you access the array by looping over the 2nd number first, i.e.:

x[0][0]          x[0][1]                 x[0][2]                         x[0][3]                                 x[1][0] etc...

Meaning that you're hitting them all in order. Now look at the 1st version. You're doing:

x[0][0]                                 x[1][0]                                                                 x[2][0]         x[0][1]                                         x[1][1] etc...

Because of the way C laid out the 2-d array in memory, you're asking it to jump all over the place. But now for the kicker: Why does this matter? All memory accesses are the same, right?

No: because of caches. Data from your memory gets brought over to the CPU in little chunks (called 'cache lines'), typically 64 bytes. If you have 4-byte integers, that means you're geting 16 consecutive integers in a neat little bundle. It's actually fairly slow to fetch these chunks of memory; your CPU can do a lot of work in the time it takes for a single cache line to load.

Now look back at the order of accesses: The second example is (1) grabbing a chunk of 16 ints, (2) modifying all of them, (3) repeat 4000*4000/16 times. That's nice and fast, and the CPU always has something to work on.

The first example is (1) grab a chunk of 16 ints, (2) modify only one of them, (3) repeat 4000*4000 times. That's going to require 16 times the number of "fetches" from memory. Your CPU will actually have to spend time sitting around waiting for that memory to show up, and while it's sitting around you're wasting valuable time.

Important Note:

Now that you have the answer, here's an interesting note: there's no inherent reason that your second example has to be the fast one. For instance, in Fortran, the first example would be fast and the second one slow. That's because instead of expanding things out into conceptual "rows" like C does, Fortran expands into "columns", i.e.:

0,0 | 1,0 | 2,0 | 0,1 | 1,1 | 2,1 | 0,2 | 1,2 | 2,2 | 0,3 | 1,3 | 2,3

The layout of C is called 'row-major' and Fortran's is called 'column-major'. As you can see, it's very important to know whether your programming language is row-major or column-major! Here's a link for more info: http://en.wikipedia.org/wiki/Row-major_order

answered Oct 16 '22 22:10

Robert Martin

Nothing to do with assembly. This is due to cache misses.

C multidimensional arrays are stored with the last dimension as the fastest. So the first version will miss the cache on every iteration, whereas the second version won't. So the second version should be substantially faster.

See also: http://en.wikipedia.org/wiki/Loop_interchange.

answered Oct 16 '22 20:10

Oliver Charlesworth

Related questions
                            
                                This C function should always return false, but it doesn’t
                            
                                Detecting superfluous #includes in C/C++?
                            
                                Why should I always enable compiler warnings?
                            
                                Size of character ('a') in C/C++
                            
                                Is uninitialized local variable the fastest random number generator?
                            
                                What's the purpose of using braces (i.e. {}) for a single-line if or loop?
                            
                                Easily measure elapsed time
                            
                                Passing variable number of arguments around
                            
                                How to print a int64_t type in C
                            
                                How to split a string literal across multiple lines in C / Objective-C?
                            
                                What is the difference between NULL, '\0' and 0?
                            
                                Combining C++ and C - how does #ifdef __cplusplus work?
                            
                                Pointers in C: when to use the ampersand and the asterisk?
                            
                                How to find the 'sizeof' (a pointer pointing to an array)?
                            
                                MIN and MAX in C
                            
                                How to write iOS app purely in C
                            
                                Printing all global variables/local variables?
                            
                                Printing leading 0's in C
                            
                                What is the LD_PRELOAD trick?
                            
                                do { ... } while (0) — what is it good for? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does the order of the loops affect performance when iterating over a 2D array?

Tags:

performance

c

optimization

for-loop

cpu-cache

Mark

People also ask

2 Answers

Robert Martin

Oliver Charlesworth

Recent Activity

Donate For Us