Here is simple C++ code that compare iterating 2D array row major with column major. <pre class="prettyprint"><code>#include <iostream> #include <ctime> using namespace std; const int d = 10000; int** A = new int* [d]; int main(int argc, const char * argv[]) { for(int i = 0; i < d; ++i) A[i] = new int [d]; clock_t ColMajor = clock(); for(int b = 0; b < d; ++b) for(int a = 0; a < d; ++a) A[a][b]++; double col = static_cast<double>(clock() - ColMajor) / CLOCKS_PER_SEC; clock_t RowMajor = clock(); for(int a = 0; a < d; ++a) for(int b = 0; b < d; ++b) A[a][b]++; double row = static_cast<double>(clock() - RowMajor) / CLOCKS_PER_SEC; cout << "Row Major : " << row; cout << "\nColumn Major : " << col; return 0; } </code></pre> Result for different values of d: d = 10^3 : <blockquote> Row Major : 0.002431 Column Major : 0.017186 </blockquote> d = 10^4 : <blockquote> Row Major : 0.237995 Column Major : 2.04471 </blockquote> d = 10^5 <blockquote> Row Major : 53.9561 Column Major : 444.339 </blockquote> Now the question is why row major is faster than column major?

It obviously depends on the machine you're on but very generally speaking: <ol> <li>Your computer stores parts of your program's memory in a cache that has a much smaller latency than main memory (even when compensating for cache hit time).</li> <li>C arrays are stored in a contiguous by row major order. This means if you ask for element <code>x</code>, then element <code>x+1</code> is stored in main memory at a location directly following where <code>x</code> is stored.</li> <li>It's typical for your computer cache to "pre-emptively" fill cache with memory addresses that haven't been used yet, but that are locally close to memory that your program has used already. Think of your computer as saying: "well, you wanted memory at address X so I am going to assume that you will shortly want memory at X+1, therefore I will pre-emptively grab that for you and place it in your cache".</li> </ol> When you enumerate your array via row major order, you're enumerating it in such a way where it's stored in a contiguous manner in memory, and your machine has already taken the liberty of pre-loading those addresses into cache for you because it guessed that you wanted it. Therefore you achieve a higher rate of cache hits. When you're enumerating an array in another non-contiguous manner then your machine likely won't predict the memory access pattern you're applying, so it wont be able to pre-emptively pull memory addresses into cache for you, and you won't incur as many cache hits, so main memory will have to be accessed more frequently which is slower than your cache. Also, this might be better suited for https://cs.stackexchange.com/ because the way your system cache behaves is implemented in hardware, and spatial locality questions seem better suited there.

Why is iterating 2D array row major faster than column major?

Tags:

c++

arrays

iteration

compiler-construction

Here is simple C++ code that compare iterating 2D array row major with column major.

#include <iostream>
#include <ctime>

using namespace std;

const int d = 10000;

int** A = new int* [d];

int main(int argc, const char * argv[]) {
    for(int i = 0; i < d; ++i)
        A[i] = new int [d];
    
    clock_t ColMajor = clock();
    
    for(int b = 0; b < d; ++b)
        for(int a = 0; a < d; ++a)
            A[a][b]++;
    
    double col = static_cast<double>(clock() - ColMajor) / CLOCKS_PER_SEC;
    
    clock_t RowMajor = clock();
    for(int a = 0; a < d; ++a)
        for(int b = 0; b < d; ++b)
            A[a][b]++;
    
    double row = static_cast<double>(clock() - RowMajor) / CLOCKS_PER_SEC;
    

    
    cout << "Row Major : " << row;
    cout << "\nColumn Major : " << col;

    return 0;
}

Result for different values of d:

d = 10^3 :

Row Major : 0.002431

Column Major : 0.017186

d = 10^4 :

Row Major : 0.237995

Column Major : 2.04471

d = 10^5

Row Major : 53.9561

Column Major : 444.339

Now the question is why row major is faster than column major?

728

asked Nov 15 '15 17:11

Amanita

1 Answers

It obviously depends on the machine you're on but very generally speaking:

Your computer stores parts of your program's memory in a cache that has a much smaller latency than main memory (even when compensating for cache hit time).
C arrays are stored in a contiguous by row major order. This means if you ask for element x, then element x+1 is stored in main memory at a location directly following where x is stored.
It's typical for your computer cache to "pre-emptively" fill cache with memory addresses that haven't been used yet, but that are locally close to memory that your program has used already. Think of your computer as saying: "well, you wanted memory at address X so I am going to assume that you will shortly want memory at X+1, therefore I will pre-emptively grab that for you and place it in your cache".

When you enumerate your array via row major order, you're enumerating it in such a way where it's stored in a contiguous manner in memory, and your machine has already taken the liberty of pre-loading those addresses into cache for you because it guessed that you wanted it. Therefore you achieve a higher rate of cache hits. When you're enumerating an array in another non-contiguous manner then your machine likely won't predict the memory access pattern you're applying, so it wont be able to pre-emptively pull memory addresses into cache for you, and you won't incur as many cache hits, so main memory will have to be accessed more frequently which is slower than your cache.

Also, this might be better suited for https://cs.stackexchange.com/ because the way your system cache behaves is implemented in hardware, and spatial locality questions seem better suited there.

answered Oct 10 '22 04:10

David Zorychta

Related questions
                            
                                How to determine if a file is contained by path with Boost Filesystem Library v3?
                            
                                "ambiguous overload for 'operator[]'" if conversion operator to int exist
                            
                                Why is double not allowed as a non-type template parameter? [duplicate]
                            
                                Find the nth element satisfying a condition?
                            
                                Strings in static memory instances count
                            
                                ___sincos_stret undefined symbol when linking
                            
                                Android NDK, keeping live C++ objects
                            
                                Why std::shared_ptr calls destructors from base and derived classes, where delete calls only destructor from base class? [duplicate]
                            
                                Member not zeroed, a clang++ bug?
                            
                                If I need polymorphism should I use raw pointers instead of unique_ptr?
                            
                                Initializer list in a range for loop
                            
                                Existing API for NLP in C++?
                            
                                Why does adding a '0' to an int digit allow conversion to a char?
                            
                                Is it possible to allow one std::function type accept lambdas with different signatures
                            
                                Why is reading from a memory mapped file so fast?
                            
                                Why "override" is at the end in C++11?
                            
                                Qt: Compiler is out of heap space
                            
                                Why is std::is_base_of<T, T> true when T is a class type, but false when T is a built-in type?
                            
                                Strange result from mutual reference in C++ macro
                            
                                Delete std::thread after calling join?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With