Multiple nested for loops vs single for loop [closed]

Tags:

I was doing a little speed testing in c++ (MSVS) and got a strange result. I was testing the speed of using one for loop vs multiple nested for loops. Here is the code:

double testX = 0;
// Single loop executes in roughly 0.04 seconds
for( int i = 0; i < 27000000; i++ ){
    testX += 1;
}

// Nested loop executes in roughly 0.03 seconds
for( int x = 0; x < 300; x++ ){
    for( int y = 0; y < 300; y++ ){
        for( int z = 0; z < 300; z++ ){
            testX += 1;
        }
    }
}

As you can see, the speed difference is fairly obvious. I have run this many times, and those are the average times I am seeing (these are timed using glfwGetTime()).

So my question is: why? Is my test method inadequate? Am I using too few loops? I have tried searching google, and the only similar question I could find related his problem to cache coherency, but since these are empty for loops, I didn't think it would really have an effect.

Any help is appriciated :)

Edit: Thanks to the comments, I realized that using empty for loops probably wasn't the best way of testing things... So I have updated my code to do some (very) simple operations to a double. I also am compiling in release mode. However, though the two methods are a lot more similar in times, the second method is still slightly faster.

And yes, this is all the test code (minus the timing/output functions, but those aren't exactly specific to the question).

772

asked Sep 14 '12 04:09

Jamie Syme

2 Answers

The compiler won't "optimize" the loops away when the testX variable is used somewhere later in the code. When I just add one line to the code to output testX, the results are as follows:

single for loop: 1.218 ms
nested for loop: 1.218 ms

This pretty much shows that the compiler converts the nested loop into a single loop, whenever possible. The loop index can be used to prevent that optimisation:

Modifying the code this way

for( int i = 0; i < 27000000; i++ ){
    testX += i;
}

and

for( int x = 0; x < 300; x++ ){
    testX += x;
    for( int y = 0; y < 300; y++ ){
        testX += y;
        for( int z = 0; z < 300; z++ ){
            testX += z;
        }
    }
}

will add a little overhead to the nested loop but the execution time shows

single for loop: 1.224 ms
nested for loop: 1.226 ms

The times given here are averaged over 30.000 loop runs.

Note: The testX += x; only contributes 1 in 90000 and the testX += x; only contributes 1 in 300. Thus the two sections above remain comparable.

Nested loops are not much slower than single loops but your observation that they are faster is not true.

And: The times you show are about 40 times the times I observed. I'd suggest to carefully inspect the compiler settings since I ran the test on a medium speed hardware. Maybe the results of glfwGetTime() are questionable and this is the main reason for your question. Have you tried to use another timing scheme?

Edit: In order to prevent compiler optimisation the loop limit can be choosen to be non constant:

int lmt = rand() % 1 + 300;      // random value 300 or 301 
int big_lmt = lmt * lmt * lmt;   // random value 27000000 or 27270901

for( int i = 0; i < big_lmt; i++ ){
    testX += i;
}

for( int x = 0; x < lmt; x++ ){
    testX += x;
    for( int y = 0; y < lmt; y++ ){
        testX += y;
        for( int z = 0; z < lmt; z++ ){
            testX += z;
        }
    }
}

This avoids compiler predictability.

Results (for a lmt = 300 case to be comparable):

single for loop: 1.213 ms
nested for loop: 1.216 ms

Result:

Nested loops are not faster than single loops.

answered Oct 03 '22 09:10

Arno

If you don't use your for variables (x,y,z) inside your for loop, a smart compiler could (and should) convert your second form in a single for loop without nesting. Unless you prevent such compiler optimization removing static predictability by having the user input the x,y,z values at runtime from stdin, or reading from some stream, etc..

Furthermore, if you don't do something with your testX variable (say, printing it to stdout), a smart compiler could (and should) optimize it away, i.e. remove the dead code altogether.

So what I'm saying is that the benchmark, as it is right now, is somehow ill-formed.

answered Oct 03 '22 11:10

Unai Vivi

Related questions
                            
                                compatibility of c++11 and MPI library
                            
                                Why is my implementation of strcmp not returning the proper value?
                            
                                c++ thread overhead
                            
                                escape sequence? using quotes in Qstring
                            
                                What is "__cdecl"?
                            
                                write to boost::asio socket from different threads
                            
                                cannot convert parameter 1 from 'cli::interior_ptr<Type>' to 'CvCapture **'
                            
                                Creating boost::thread with an std::shared_ptr object instance
                            
                                Thread creation, the CRT and DLL's how is it meant to be done?
                            
                                "error: no match for ‘operator<<" when working with std::string
                            
                                enable_if iterator as a default template parameter?
                            
                                zero value for all types?
                            
                                Difference between + operator and append function for string class object in c++?
                            
                                How to convert an XMLElement to string in TinyXML2
                            
                                Checking existence of a function in a library in linux
                            
                                cannot convert '*void(MyClass::*)(void*) to void*(*)(void*) in pthread_create function
                            
                                how do I extend Lua with a static c++ library?
                            
                                Explicit type casting of object to int *
                            
                                silence -Wparentheses-equality error
                            
                                Access enum values in C++98 and C++11

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Multiple nested for loops vs single for loop [closed]

Tags:

c++

for-loop

nested

timing

Jamie Syme

People also ask

2 Answers

Arno

Unai Vivi

Recent Activity

Donate For Us