Loop unrolling behaviour in GCC

Tags:

This question is in part a follow up question to GCC 5.1 Loop unrolling.

According to the GCC documentation, and as stated in my answer to the above question, flags such as -funroll-loops turn on "complete loop peeling (i.e. complete removal of loops with a small constant number of iterations)". Therefore, when such a flag is enabled, the compiler can choose to unroll a loop if it determines that this would optimise the execution of a given piece of code.

Nevertheless, I noticed in one of my projects that GCC would sometimes unroll loops even though the relevant flags were not enabled. For instance, consider the following simple piece of code:

int main(int argc, char **argv)
{
  int k = 0;
  for( k = 0; k < 5; ++k )
  {
    volatile int temp = k;
  }
}

When compiling with -O1, the loop is unrolled and the following assembly code is generated with any modern version of GCC:

main:
        movl    $0, -4(%rsp)
        movl    $1, -4(%rsp)
        movl    $2, -4(%rsp)
        movl    $3, -4(%rsp)
        movl    $4, -4(%rsp)
        movl    $0, %eax
        ret

Even when compiling with the additional -fno-unroll-loops -fno-peel-loops to make sure the flags are disabled, GCC unexpectedly still performs loop unrolling on the example described above.

This observation leads me to the following closely related questions. Why does GCC perform loop unrolling even though the flags corresponding to this behaviour are disabled? Is unrolling also controlled by other flags which can make the compiler unroll a loop in some cases even though -funroll-loops is disabled? Is there a way to completely disable loop unrolling in GCC (a part from compiling with -O0)?

Interestingly the Clang compiler has the expected behaviour here, and seems to only perform unrolling when -funroll-loops is enabled, and not in other cases.

Thanks in advance, any additional insights on this matter would be greatly appreciated!

819

asked Sep 13 '16 20:09

Pyves

1 Answers

Why does GCC perform loop unrolling even though the flags corresponding to this behaviour are disabled?

Think of it from a pragmatic view: what do you want when passing such flag to the compiler? No C++ developer will ask GCC to unroll or not unroll loops, just for the sake of having loops or not in assembly code, there is a goal. The goal with -fno-unroll-loops is, for example, to sacrifice a bit of speed in order to reduce the size of your binary, if you are developing an embedded software with limited storage. On the other hand, the goal with -funrool-loops is to tell the compiler that you do not care about the size of you binary, so it should not hesitate to unroll loops.

But that does not mean that the compiler will blindly unroll or not all your loops!

In your example, the reason is simple: the loop contains only one instruction - few bytes on any platforms - and the compiler knows that this is negligeable and will anyway take almost the same size as the assembly code needed for the loop (sub + mov + jne on x86-64).

This is why gcc 6.2, with -O3 -fno-unroll-loops turns this code:

int mul(int k, int j) 
{   
  for (int i = 0; i < 5; ++i)
    volatile int k = j;

  return k; 
}

... to the following assembly code:

 mul(int, int):
  mov    DWORD PTR [rsp-0x4],esi
  mov    eax,edi
  mov    DWORD PTR [rsp-0x4],esi
  mov    DWORD PTR [rsp-0x4],esi
  mov    DWORD PTR [rsp-0x4],esi
  mov    DWORD PTR [rsp-0x4],esi  
  ret

It does not listen to you because it would (almost, depending on the architecture) not change the size of the binary but it is faster. However, if you increase a bit your loop counter...

int mul(int k, int j) 
{   
  for (int i = 0; i < 20; ++i)
    volatile int k = j;

  return k; 
}

... it follows your hint:

 mul(int, int):
  mov    eax,edi
  mov    edx,0x14
  nop    WORD PTR [rax+rax*1+0x0]
  sub    edx,0x1
  mov    DWORD PTR [rsp-0x4],esi
  jne    400520 <mul(int, int)+0x10>
  repz ret

You will get the same behavior if you keep your loop counter at 5 but you add some code into the loop.

To sum up, think of all these optimization flags as a hint for the compiler, and from a pragmatic developer point of view. It is always a trade-off, and when you build a software, you never want to ask for all or no loop unrolling.

As a final note, another very similar example is the -f(no-)inline-functions flag. I am fighting every day the compiler to inline (or not!) some of my functions (with the inline keyword and __attribute__ ((noinline)) with GCC), and when I check the assembly code, I see that this smartass is still doing sometimes what it wants, when I want to inline a function that is definitely too long for its taste. And most of the time, it is the right thing to do and I am happy!

166

answered Oct 03 '22 02:10

AntiClimacus

Related questions
                            
                                Does using bitwise not operator (~) on boolean values invoke Undefined Behavior?
                            
                                Why a default constructor is needed using unordered_map and tuple?
                            
                                Do pointer types wrap around their maximum value?
                            
                                Is there way to use rapidjson with std::string efficiently?
                            
                                Can WM_NEXTDLGCTL be used with non-dialog windows?
                            
                                Passing template function as argument for normal function
                            
                                C++ How to create a std::unique_ptr from a class that takes parameters on constructor
                            
                                Is using 'decltype' in the declaration of pointer to member valid?
                            
                                CMake link a shared library to static libraries
                            
                                Why is C++17 adding attributes to namespaces and enumerators?
                            
                                How to write a function pointer to a function returning a function pointer to a function?
                            
                                invalid conversion from ‘const type*’ to ‘type*’
                            
                                SFINAE and noexcept specifier
                            
                                Passing function objects into std algorithms by reference
                            
                                Are public, in-class typedefs necessary in C++1y?
                            
                                At what point does memory allocated by malloc get a type?
                            
                                Why member variables of a const object are not const
                            
                                strange behavior when trying to compile a source with tcc against gcc generated .o file
                            
                                type does not provide a call operator
                            
                                I am using c++14 or is my code, using structs, invalid?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Loop unrolling behaviour in GCC

Tags:

c++

compiler-optimization

gcc

loop-unrolling

Pyves

People also ask

1 Answers

AntiClimacus

Recent Activity

Donate For Us