Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do work items execute when conditionals are used in GPU programming?

If you have work items executing in a wavefront and there is a conditional such as:

  if(x){
        ...
  }
  else{
       ....
  }

What do the work-items execute? is it the case whereby all workitems in the wavefront will execute the first branch (i.e. x == true). If there are no work-items for which x is false, then the rest of the conditional is skipped?

What happens if one work-item takes the alternative path. Am I told that all workitems will execute the alternate path as well (therefore executing both paths?). Why is this the case and how does it not mess up the program execution

like image 895
Roger Avatar asked May 05 '11 12:05

Roger


People also ask

What is GPU programming used for?

GPU Programming is a method of running highly parallel general-purpose computations on GPU accelerators. While the past GPUs were designed exclusively for computer graphics, today they are being used extensively for general-purpose computing (GPGPU computing) as well.

Can you run code on GPU?

Using the CUDA Toolkit you can accelerate your C or C++ applications by updating the computationally intensive portions of your code to run on GPUs. To accelerate your applications, you can call functions from drop-in libraries as well as develop custom applications using languages including C, C++, Fortran and Python.

What is the correct order of CUDA program processing?

In a typical CUDA program, data are first send from main memory to the GPU memory, then the CPU sends instructions to the GPU, then the GPU schedules and executes the kernel on the available parallel hardware, and finally results are copied back from the GPU memory to the CPU memory. ...

What are the three general section of CUDA program?

To execute any CUDA program, there are three main steps: Copy the input data from host memory to device memory, also known as host-to-device transfer. Load the GPU program and execute, caching data on-chip for performance. Copy the results from device memory to host memory, also called device-to-host transfer.


1 Answers

NVIDIA gpus use conditional execution to handle branch divergence within the SIMD group ("warp"). In your if..else example, both branches get executed by every thread in the diverging warp, but those threads which don't follow a given branch are flagged and perform a null op instead. This is the classic branch divergence penalty - interwarp branch divergence takes two passes through the code section to retire for warp. This isn't ideal, which is why performance oriented code tries to minimize this. One thing which often catches out people is making an assumption about which section of a divergent path gets executed "first". The have been some very subtle bugs cause by second guessing the internal order of execution within a divergent warp.

For simpler conditionals, NVIDIA GPUs support conditional evaluation at the ALU, which causes no divergence, and for conditionals where the whole warp follows the same path, there is also obviously no penalty.

like image 53
talonmies Avatar answered Oct 12 '22 14:10

talonmies