Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the performance of my #pragma-unrolled loop degrade if the trip count is not constant?

Tags:

loops

cuda

unroll

I have following code using loop unrolling:

#pragma unroll
for (int i=0;i<n;i++)
{
    ....
}

here if n is a defined constant, everything works fine. However, if n is a variable, performance dramatically reduced. I noticed roughly 3 times the instructions are issued and executed. I guess I am looking for a way to do loop unrolling at run time, may be that's just not feasible.

like image 763
small_potato Avatar asked Mar 31 '11 05:03

small_potato


People also ask

Why is my Windows 10 slowing down?

One reason your Windows 10 PC may feel sluggish is that you've got too many programs running in the background — programs that you rarely or never use. Stop them from running, and your PC will run more smoothly.

Why is my PC having performance issues?

Disable background programs A slow computer is often caused by too many programs running simultaneously, taking up processing power and reducing the PC's performance. Some programs will continue running in the background even after you have closed them or will start automatically when you boot up your computer.


1 Answers

CUDA is a compiled language. Loop unrolling is a compiler optimization. Runtime loop unrolling would imply some sort of runtime interpreter or dynamic code generation. That clearly can't happen.

It would make sense that the unrolled case executes as many or more instructions than the naïve loop, because the compiler will replace the loop with repetitions of the loop contents. If the unrolled case executes less instructions, that would imply that the compiler is pre-calculating some or all of the loop contents and replacing code with a constant result.

It all depends on what is contained in the loop.

like image 161
talonmies Avatar answered Oct 06 '22 05:10

talonmies