Inactive threads vs. predicated off threads in CUDA

Tags:

I am profiling my CUDA kernel using Visual Profiler 6.0 and on nearly every line there is a bar which shows percentages of Inactive threads and Predicated off threads.

I was wondering what exactly those two values mean and how 'bad' are they?

As far as I know, the Inactive threads (shown in red) are threads that diverged and are inactive (due to some if statement) and Predicated off threads (shown in blue) are correctly predicated by compiler to be inactive. Is that correct?

If that is true, I do not understand why following bunch of lines in my kernel has 95% of inactive threads, the only ifs are the loops:

Inactive threads print-screen

The TFloat is template for either float or double type. What is causing the thread inactivity there?

I am using CUDA 6.0 and the code is running on Tesla K40c under compute capability 3.5.

332

asked Apr 25 '14 03:04

NightElfik

1 Answers

From the following link:

There are two reasons threads within a warp can be disabled: being inactive, and being predicated off. If the block size is not a multiple of the warp size, the last warp in the block will have inactive threads. When some threads within a warp exit the kernel while others continue, the exiting threads become inactive. Threads become predicated off when divergent branches occur, because the separate paths taken by the threads must be serialized, and threads are disabled for paths they do not take.

So it looks like your dimensionsCount is zero (or close) on most of the threads, and they exit before a few other threads are still computing.

On the other hand, "predicated off" may be recorded when the the actual branching condition is hit - some thread jump to exit (but still active!), others jump to looping. This is also suggested by the SASS code on the right at your snapshot: the only blue bar appears at BRA instruction.

answered Sep 23 '22 09:09

Dimaleks

Related questions
                            
                                C++ typeid as a return type
                            
                                C++ template specialization/overloading
                            
                                Should a friend operator in a class with enum parameters be found by Koenig lookup?
                            
                                Encryption and Decryption using C++
                            
                                How can get I a hardware ID in Qt
                            
                                Is the compiler allowed leeway in what it considers undefined behavior in a constant expression?
                            
                                pthread_key_t vs local variable
                            
                                How to use Chrome Extension functions (NaCl) in my website?
                            
                                Alignment : warning C4316 in all classes that have aligned members
                            
                                Why enumeration cannot be a template?
                            
                                OpenGL Non-exclusive Fullscreen Mode (A.K.A. Fullscreen Borderless Window)
                            
                                Store non-English string in std::string
                            
                                Given a number check if digits form an equation with addition?
                            
                                Random in C++11 with closed interval
                            
                                Are using a debugger and heavy usage of C++ templates incompatible in the long run?
                            
                                What are constrained templates?
                            
                                Why are there 8 bytes between the end of a buffer and the saved frame pointer?
                            
                                Get the text margins of a QLineEdit
                            
                                C++ arbitrary length integers
                            
                                C++ Overloading takes precedence over Specialization?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Inactive threads vs. predicated off threads in CUDA

Tags:

c++

profiling

cuda

NightElfik

People also ask

1 Answers

Dimaleks

Recent Activity

Donate For Us