What do work items execute when conditionals are used in GPU programming?

Tags:

If you have work items executing in a wavefront and there is a conditional such as:

  if(x){
        ...
  }
  else{
       ....
  }

What do the work-items execute? is it the case whereby all workitems in the wavefront will execute the first branch (i.e. x == true). If there are no work-items for which x is false, then the rest of the conditional is skipped?

What happens if one work-item takes the alternative path. Am I told that all workitems will execute the alternate path as well (therefore executing both paths?). Why is this the case and how does it not mess up the program execution

895

asked May 05 '11 12:05

Roger

1 Answers

NVIDIA gpus use conditional execution to handle branch divergence within the SIMD group ("warp"). In your if..else example, both branches get executed by every thread in the diverging warp, but those threads which don't follow a given branch are flagged and perform a null op instead. This is the classic branch divergence penalty - interwarp branch divergence takes two passes through the code section to retire for warp. This isn't ideal, which is why performance oriented code tries to minimize this. One thing which often catches out people is making an assumption about which section of a divergent path gets executed "first". The have been some very subtle bugs cause by second guessing the internal order of execution within a divergent warp.

For simpler conditionals, NVIDIA GPUs support conditional evaluation at the ALU, which causes no divergence, and for conditionals where the whole warp follows the same path, there is also obviously no penalty.

answered Oct 12 '22 14:10

talonmies

Related questions
                            
                                How to provide Matlab with the old gcc version it wants?
                            
                                Prefetch in cuda (through C code)
                            
                                Determining the least element and its position in each matrix column with CUDA Thrust
                            
                                Use Vulkan VkImage as a CUDA cuArray
                            
                                CUDA kernel - nested for loop
                            
                                How to reduce CUDA synchronize latency / delay
                            
                                In what types of loops is it best to use the #pragma unroll directive in CUDA?
                            
                                How to link host code with a static CUDA library after separable compilation?
                            
                                How to do Matrix Addition with CUDA C
                            
                                CUDA: Difference between CPU timer and CUDA timer event?
                            
                                CUDA Visual Profiler 'Interactive' X config option?
                            
                                How to include <cuda_runtime.h> in .cpp file
                            
                                c++ version supported by cuda 5.0
                            
                                Confusion with CUDA PTX code and register memory
                            
                                What utility/binary can I call to determine an nVIDIA GPU's Compute Capability?
                            
                                CUDA 4.1 printf() Error
                            
                                Calculating FLOPS (Floating-point Operations per Seconds)
                            
                                Does the nVidia RDMA GPUDirect always operate only physical addresses (in physical address space of the CPU)?
                            
                                How to asynchronously copy memory from the host to the device using thrust and CUDA streams
                            
                                How to debug into CUDA kernel code using visual studio 2008?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What do work items execute when conditionals are used in GPU programming?

Tags:

cuda

gpgpu

gpu

opencl

Roger

People also ask

1 Answers

talonmies

Recent Activity

Donate For Us