I was going through loops and found a significant difference in accessing loops. I can't understand what is the thing that causes such difference in both cases? First Example: Execution Time; 8 seconds <pre class="prettyprint"><code>for (int kk = 0; kk < 1000; kk++) { sum = 0; for (int i = 0; i < 1024; i++) for (int j = 0; j < 1024; j++) { sum += matrix[i][j]; } } </code></pre> Second Example: Execution Time: 23 seconds <pre class="prettyprint"><code>for (int kk = 0; kk < 1000; kk++) { sum = 0; for (int i = 0; i < 1024; i++) for (int j = 0; j < 1024; j++) { sum += matrix[j][i]; } } </code></pre> What causes so much execution time difference just exchanging <pre class="prettyprint"><code>matrix[i][j] </code></pre> to <pre class="prettyprint"><code>matrix[j][i] </code></pre> ?

It's an issue of memory cache. <code>matrix[i][j]</code> has better cache hits than <code>matrix[j][i]</code>, since <code>matrix[i][j]</code> has more continuous memory accessing chances. For example, when we access <code>matrix[i][0]</code>, the cache may load a continuous segment of memory containing <code>matrix[i][0]</code>, thus, accessing <code>matrix[i][1]</code>, <code>matrix[i][2]</code>, ..., will benefit from caching speed, since <code>matrix[i][1]</code>, <code>matrix[i][2]</code>, ... are near to <code>matrix[i][0]</code>. However, when we access <code>matrix[j][0]</code>, it is far from <code>matrix[j - 1][0]</code> and may not been cached, and can not benefit from caching speed. Especially, a matrix is normally stored as a continuous big segment of memory, and the cacher may predicate the behavior of memory accessing and always cache the memory. That's why <code>matrix[i][j]</code> is faster. This is typical in CPU cache based performance optimizing.

Why is there a significant difference in this C++ for loop's execution time? [duplicate]

Tags:

c++

performance

nested-loops

I was going through loops and found a significant difference in accessing loops. I can't understand what is the thing that causes such difference in both cases?

First Example:

Execution Time; 8 seconds

for (int kk = 0; kk < 1000; kk++) {     sum = 0;     for (int i = 0; i < 1024; i++)         for (int j = 0; j < 1024; j++)         {             sum += matrix[i][j];         } }

Second Example:

Execution Time: 23 seconds

for (int kk = 0; kk < 1000; kk++) {     sum = 0;     for (int i = 0; i < 1024; i++)         for (int j = 0; j < 1024; j++)         {             sum += matrix[j][i];         } }

What causes so much execution time difference just exchanging

matrix[i][j]

matrix[j][i]

680

asked Oct 27 '14 08:10

Massab

1 Answers

It's an issue of memory cache.

matrix[i][j] has better cache hits than matrix[j][i], since matrix[i][j] has more continuous memory accessing chances.

For example, when we access matrix[i][0], the cache may load a continuous segment of memory containing matrix[i][0], thus, accessing matrix[i][1], matrix[i][2], ..., will benefit from caching speed, since matrix[i][1], matrix[i][2], ... are near to matrix[i][0].

However, when we access matrix[j][0], it is far from matrix[j - 1][0] and may not been cached, and can not benefit from caching speed. Especially, a matrix is normally stored as a continuous big segment of memory, and the cacher may predicate the behavior of memory accessing and always cache the memory.

That's why matrix[i][j] is faster. This is typical in CPU cache based performance optimizing.

172

answered Sep 28 '22 00:09

Peixu Zhu

Related questions
                            
                                STL vector: Moving all elements of a vector
                            
                                if (cin >> x) - Why can you use that condition?
                            
                                Difference between void main and int main in C/C++? [duplicate]
                            
                                How can I take a screenshot in a windows application?
                            
                                Is it possible to create a function dynamically, during runtime in C++?
                            
                                Parallel Loops in C++
                            
                                Signed overflow in C++ and undefined behaviour (UB)
                            
                                What are some reasons a Release build would run differently than a Debug build [closed]
                            
                                Long Vs. Int C/C++ - What's The Point?
                            
                                C++ function argument safety
                            
                                constant variables not working in header
                            
                                What is the difference between a concrete class and an abstract class?
                            
                                C++ [] array operator with multiple arguments?
                            
                                Automatically pick a variable type big enough to hold a specified number
                            
                                openCV program compile error "libopencv_core.so.2.4: cannot open shared object file: No such file or directory" in ubuntu 12.04
                            
                                C++11 auto declaration with and without pointer declarator
                            
                                How do I print vector values of type glm::vec3 that have been passed by reference?
                            
                                Passing a string literal as a type argument to a class template
                            
                                Passing as const and by reference - Worth it? [duplicate]
                            
                                What changes introduced in C++14 can potentially break a program written in C++11?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With