Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loop unrolling in clang

I am trying to selectively unroll the second loop in the following program:

#include <stdio.h>

int main()
{
    int in[1000], out[1000]; 
    int i,j;

    #pragma nounroll
    for (i = 100; i < 1000; i++)
    {
       in[i]+= 10;
    }

    #pragma unroll 2
    for (j = 100; j < 1000; j++)
    {
       out[j]+= 10;
    }

    return 1;
}

When I run clang (3.5) with the following options, it unrolls both the loops 4 times.

clang -std=c++11 -O3 -fno-slp-vectorize -fno-vectorize -mllvm -unroll-count=4 -mllvm -debug-pass=Arguments -emit-llvm -c *.cpp 

What am I doing wrong? Also, if I add -fno-unroll-loops, or skip the -unroll-count=4 flag, it does not unroll any loop.

Also, any hints on how to debug pragma errors?

like image 546
k01 Avatar asked Dec 05 '14 06:12

k01


People also ask

What is meant by loop unrolling?

Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff.

What is loop unrolling in arm?

ARM Compiler toolchain Using the Compiler Version 4.1 When a loop is unrolled, a loop counter needs to be updated less often and fewer branches are executed. If the loop iterates only a few times, it can be fully unrolled so that the loop overhead completely disappears.

What is Pragma loop unrolling?

The UNROLL pragma specifies to the compiler how many times a loop should be unrolled. The UNROLL pragma is useful for helping the compiler utilize SIMD instructions. It is also useful in cases where better utilization of software pipeline resources are needed over a non-unrolled loop.


2 Answers

I think there is no support for such pragmas in clang 3.5.

However starting from 3.6, you can use #pragma clang loop unroll(enable | disable) to enable or disable the automatic diagnostics-based unroll feature. If you want to fully unroll a loop then #pragma clang loop unroll(full) is a shorthand for that. You can also use #pragma clang loop unroll_count(N) - where N is a compile-time constant - to explicitly specify the unroll count.

More info here.

Your code rewritten in terms of the above stuff:

#include <stdio.h>

int main()
{

  int in[1000], out[1000]; 
  int i,j;

  #pragma clang loop unroll(disable)
  for (i = 100; i < 1000; i++)
  {
     in[i]+= 10;
  }

  #pragma clang loop unroll_count(2)
  for (j = 100; j < 1000; j++)
  {
     out[j]+= 10;
  }


  return 1;
}
like image 167
plasmacel Avatar answered Oct 08 '22 19:10

plasmacel


-unroll-count=4 has a higher priority than #pragma clang loop unroll_count(2). That's why it ends up unroll it by 4. Meaning the compiler is following the unroll-count command line option NOT the pragma. Also as plasmacel mentioned, #pragma clang loop unroll is not supported before clang 3.6.

like image 25
jtony Avatar answered Oct 08 '22 20:10

jtony