When testing my code I noticed a significant increase in execution time when the empty ranged-for loop
was deleted or not. Normally I would think that the compiler would notice that the for loop serves no purpose and would therefor be ignored. As the compiler flags I am using -O3
(gcc 5.4
). I also tested it with a vector instead of a set and that seems to work and give the same execution time in both cases. It seems that the incrementation of the iterator costs all the extra time.
First case with the ranged for loop still present (slow):
#include <iostream> #include <set> int main () { long result; std::set<double> results; for (int i = 2; i <= 10000; ++i) { results.insert(i); for (auto element : results) { // no operation } } std::cout << "Result: " << result << "\n"; }
Second case with the ranged for loop deleted (fast):
#include <iostream> #include <set> int main () { long result; std::set<double> results; for (int i = 2; i <= 10000; ++i) { results.insert(i); } std::cout << "Result: " << result << "\n"; }
The two most important characteristics are the speed and size of the code. Other characteristics include the amount of energy required to execute the code, the time it takes to compile the code and, in case the resulting code requires Just-in-Time (JIT) compilation, the time it takes to JIT compile the code.
The code optimization in the synthesis phase is a program transformation technique, which tries to improve the intermediate code by making it consume fewer resources (i.e. CPU, Memory) so that faster-running machine code will result.
Compilers are free to optimize code so long as they can guarantee the semantics of the code are not changed. I would suggestion starting at the Compiler optimization wikipedia page as there are many different kinds of optimization that are performed at many different stages.
Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster.
Internally std::set
iterator uses some kind of pointer chain. This seems to be the issue.
Here is a minimal setup similar to your issue:
struct S { S* next; }; void f (S* s) { while (s) s = s->next; }
It's not a problem with complex collection implementations or overhead of iterators but simply this pointer chain pattern that the optimizer can't optimize.
I don't know the precise reason why optimizers fail on this pattern though.
Also, note that this variant is optimized away:
void f (S* s) { // Copy paste as many times as you wish the following two lines if(s) s = s->next; }
As suggested by @hvd this might have to do with the compiler not being able to prove the loop is not infinite. And if we write the OP loop like so:
void f(std::set<double>& s) { auto it = s.begin(); for (size_t i = 0; i < s.size() && it != s.end(); ++i, ++it) { // Do nothing } }
The compiler optimizes everything away.
The range based for loop is not as trivial as it looks. It is translated to an iterator-based loop internally in the compiler and if the iterator is complex enough the compiler may not even be allowed by the standard to remove these iterator operations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With