In C++11, we can use a much simpler "for" loop when iterating a container like the following:
for (auto i : {1, 2, 3, 4})
...;
However, I don't know the efficiency of such code. Specifically:
Update: Suppose we are using -O2, and the codes in the loop are only a few operations. As my case, I want to enumarate four directions UP DOWN LEFT RIGHT and call a function with the direction parameter. I just care if the program can have the best performance.
Thank you very much!
Second: Yes, modern compilers know how to unroll a loop like this, if it is a good idea for your target CPU. Third: Modern compilers can even auto-vectorize the loop, which is even better than unrolling.
I have found the gcc flag -funroll-all-loops . If I understand correctly, this will unroll all loops automatically without any efforts by the programmer.
When there are no loops in your code? When the loop body is large, the loop-control overhead is trivial, and the larger code size due to unrolling can cause instruction cache misses. Loop unrolling almost always results in slower code in most large applications.
But why would unrolled loops be faster in the first place? One reason for their increased performance is that they lead to fewer instructions being executed. Let us estimate the number of instructions that we need to be executed with each iteration of the simple (rolled) loop. We need to load two values into registers.
What is the type of {1, 2, 3, 4}?
std::initializer_list
will be constructed from that initialiser. That is being iterated. You even need to include <initializer_list>
for this to work.
Will compiler unroll the loop?
The language doesn't guarantee loop unrolling. You can find out whether a particular compiler unrolls a particular loop with particular options with particular target CPU by compiling and inspecting the produced assembly.
That said, the number of iterations is known at compile time, and therefore it is possible for the compiler to unroll the entire loop.
Suppose we are using -O2
For what it's worth, -O2 does not enable -funroll-loops. Before you add that option, read its documentation:
-funroll-loops
Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.
In this example, Clang did unroll the loop: https://godbolt.org/z/enKzMh while GCC did not: https://godbolt.org/z/ocfor8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With