Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

clang: Force loop unroll for specific loop

Tags:

c++

c

clang

Is there a way to tell clang to unroll a specific loop?


Googling for an answer gives me command-line options which will affect the whole compilant and not a single loop.


There is a similar question for GCC --- Tell gcc to specifically unroll a loop --- but the answer provided there does not work with clang.

Option 1 suggested there:

#pragma GCC optimize ("unroll-loops")

seems to be silently ignored. In fact

#pragma GCC akjhdfkjahsdkjfhskdfhd

is also silently ignored.

Option 2:

__attribute__((optimize("unroll-loops")))

results in a warning:

warning: unknown attribute 'optimize' ignored [-Wattributes]

Update

joshuanapoli provides a nice solution how to iterate via template metaprogramming and C++11 without creating a loop. The construct will be resolved at compile-time resulting in a repeatedly inlined body. While it is not exactly an answer to the question, it essentially achieves the same thing.

That is why I am accepting the answer. However, if you happen to know how to use a standard C loop (for, while) and force an unroll it - please share the knowledge with us!

like image 684
CygnusX1 Avatar asked Mar 07 '13 15:03

CygnusX1


People also ask

How do I unroll a loop?

A loop can be unrolled by replicating the loop body a number of times and then changing the termination logic to comprehend the multiple iterations of the loop body (Figure 6.22). The loops in Figures 6.22a and 6.22b each take four cycles to execute, but the loop in Figure 6.22b is doing four times as much work!

Why are unrolled loops faster?

But why would unrolled loops be faster in the first place? One reason for their increased performance is that they lead to fewer instructions being executed. Let us estimate the number of instructions that we need to be executed with each iteration of the simple (rolled) loop. We need to load two values into registers.

Does loop unrolling help?

Improved floating-point performance - loop unrolling can improve performance by providing the compiler more instructions to schedule across the unrolled iterations. This reduces the number of NOPs generated and also provides the compiler with a greater opportunity to generate parallel instructions.

How does loop unrolling improve compiler static rescheduling of code?

Unrolling simply replicates the loop body multiple times, adjusting the loop termination code. Loop unrolling can also be used to improve scheduling. Because it eliminates the branch, it allows instructions from different iterations to be scheduled together.


2 Answers

For a C++ program, you can unroll loops within the language. You won't need to figure out compiler-specific options. For example,

#include <cstddef>
#include <iostream>

template<std::size_t N, typename FunctionType, std::size_t I>
class repeat_t
{
public:
  repeat_t(FunctionType function) : function_(function) {}
  FunctionType operator()()
  {
    function_(I);
    return repeat_t<N,FunctionType,I+1>(function_)();
  }
private:
  FunctionType function_;
};

template<std::size_t N, typename FunctionType>
class repeat_t<N,FunctionType,N>
{
public:
  repeat_t(FunctionType function) : function_(function) {}
  FunctionType operator()() { return function_; }
private:
  FunctionType function_;
};

template<std::size_t N, typename FunctionType>
repeat_t<N,FunctionType,0> repeat(FunctionType function)
{
  return repeat_t<N,FunctionType,0>(function);
}

void loop_function(std::size_t index)
{
  std::cout << index << std::endl;
}

int main(int argc, char** argv)
{
  repeat<10>(loop_function)();
  return 0;
}

Example with complicated loop function

template<typename T, T V1>
struct sum_t
{
  sum_t(T v2) : v2_(v2) {}
  void operator()(std::size_t) { v2_ += V1; }
  T result() const { return v2_; }
private:
  T v2_;
};

int main(int argc, char* argv[])
{
  typedef sum_t<int,2> add_two;
  std::cout << repeat<4>(add_two(3))().result() << std::endl;
  return 0;
}
// output is 11 (3+2+2+2+2)

Using a closure instead of an explicit function object

int main(int argc, char* argv[])
{
  int accumulator{3};
  repeat<4>( [&](std::size_t)
  {
    accumulator += 2;
  })();
  std::cout << accumulator << std::endl;
}
like image 87
joshuanapoli Avatar answered Oct 17 '22 03:10

joshuanapoli


Clang recently gained loop unrolling pragmas (such as #pragma unroll) which can be used to specify full/partial unrolling. See http://clang.llvm.org/docs/AttributeReference.html#pragma-unroll-pragma-nounroll for more details.

like image 24
Jingyue Wu Avatar answered Oct 17 '22 04:10

Jingyue Wu