Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inactive threads vs. predicated off threads in CUDA

I am profiling my CUDA kernel using Visual Profiler 6.0 and on nearly every line there is a bar which shows percentages of Inactive threads and Predicated off threads.

I was wondering what exactly those two values mean and how 'bad' are they?

As far as I know, the Inactive threads (shown in red) are threads that diverged and are inactive (due to some if statement) and Predicated off threads (shown in blue) are correctly predicated by compiler to be inactive. Is that correct?

If that is true, I do not understand why following bunch of lines in my kernel has 95% of inactive threads, the only ifs are the loops:

Inactive threads print-screen

The TFloat is template for either float or double type. What is causing the thread inactivity there?

I am using CUDA 6.0 and the code is running on Tesla K40c under compute capability 3.5.

like image 332
NightElfik Avatar asked Apr 25 '14 03:04

NightElfik


People also ask

How many threads are there in Nvidia CUDA warp?

NVIDIA GPUs execute warps of 32 parallel threads using SIMT, which enables each thread to access its own registers, to load and store from divergent addresses, and to follow divergent control flow paths.

What is warp in CUDA?

A warp is a set of 32 threads within a thread block such that all the threads in a warp execute the same instruction. These threads are selected serially by the SM. Once a thread block is launched on a multiprocessor (SM), all of its warps are resident until their execution finishes.

What is a kernel in CUDA?

The kernel is a function executed on the GPU. Every CUDA kernel starts with a __global__ declaration specifier. Programmers provide a unique global ID to each thread by using built-in variables. Figure 2. CUDA kernels are subdivided into blocks.

How do I launch a CUDA kernel?

In order to launch a CUDA kernel we need to specify the block dimension and the grid dimension from the host code. I'll consider the same Hello World! code considered in the previous article. In the above code, to launch the CUDA kernel two 1's are initialised between the angle brackets.


1 Answers

From the following link:

There are two reasons threads within a warp can be disabled: being inactive, and being predicated off. If the block size is not a multiple of the warp size, the last warp in the block will have inactive threads. When some threads within a warp exit the kernel while others continue, the exiting threads become inactive. Threads become predicated off when divergent branches occur, because the separate paths taken by the threads must be serialized, and threads are disabled for paths they do not take.

So it looks like your dimensionsCount is zero (or close) on most of the threads, and they exit before a few other threads are still computing.

On the other hand, "predicated off" may be recorded when the the actual branching condition is hit - some thread jump to exit (but still active!), others jump to looping. This is also suggested by the SASS code on the right at your snapshot: the only blue bar appears at BRA instruction.

like image 93
Dimaleks Avatar answered Sep 23 '22 09:09

Dimaleks