Why an “if-else” statement (in GPUs code) will cut the performance in half

Tags:

I read this article:

FPGA or GPU? - The evolution continues

And someone added a comment in which he wrote:

Since GPUs are SIMD any code with an “if-else” statement will cut your performance in half. Half of the cores will execute the if part of the statement while half of the cores are in idle and then the other half cores will do the else calculations while the first half of the cores remain idle.

I can't understand why ?

Why using GPU (i.e OpenCL) when using if-else the performance will cut in a half ?

546

asked Aug 17 '17 11:08

user3668129

1 Answers

Branches in general do not affect performance but branch divergence does. That is, two threads taking different paths (e.g. one fulfills the if condition, the other does not). Because all threads of a GPU execute the same "line of code" some threads have to wait while the code which is not part of their path is executed.
Well, that is not really true as only all threads in one warp (NVIDIA) or wavefront (AMD) execute the same "line of code". (Currently, the warp size of NVIDIA GPUs is 32 and the wafefront size of AMD GPUs is 64.)

So if there is an if-else block in your kernel the worst case scenario is indeed a 50% performance drop. And even worse: If there are n possible branches the performance can decrease down to 1/n of the performance without divergence (that is no branches or all threads in a warp/ wafefront are taking the same path). Of course for such scenarios your whole kernel must be embedded in an if-else (or switch) construct.

But as written above this will only happen if the threads which are taking different paths are in the same warp/wafefront. So it is up to you to write your code/ rearrange data/ chose the algorithm/ ... to avoid branch divergence as far as possible.

Tl;DR: There can be branches but if different threads are taking different branches they have to be in different warps/ wafefronts to avoid divergence and thus performance loss.

168

answered Dec 26 '22 15:12

BlameTheBits

Related questions
                            
                                Does any OpenCL host have more than one platform?
                            
                                Pyopencl: difference between to_device and Buffer
                            
                                opencl optimal group size
                            
                                OpenCL user defined inline functions
                            
                                Error CL_INVALID_KERNEL_NAME when I use cl_khr_fp64 in a kernel
                            
                                Do bank conflicts occur on non-GPU hardware?
                            
                                how does clEnqueueMapBuffer work
                            
                                Is OpenCL good for agent based simulation?
                            
                                How do I test OpenCL on GPU when logged in remotely on Mac?
                            
                                What's the fastest way to copy and manipulate large, dense 2D arrays in c++
                            
                                CL_INVALID_WORK_GROUP_SIZE error
                            
                                CPU/Intel OpenCL performance issues, implementation questions
                            
                                C++ Template preprocessor tool
                            
                                OpenCL When to use global, private, local, constant address spaces
                            
                                getting "pygpu was configured but could not be imported" error while trying with OpenCL+Theano on AMD Radeon
                            
                                Difference between OpenCV and OpenCL
                            
                                CL_MEM_USE_HOST_PTR Vs CL_MEM_COPY_HOST_PTR Vs CL_MEM_ALLOC_HOST_PTR
                            
                                Unable to find Nvidia OpenCL SDK
                            
                                Does opencl support Function Pointers?
                            
                                Download OpenCL AMD APP SDK 3.0 for windows and linux

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why an “if-else” statement (in GPUs code) will cut the performance in half

Tags:

gpgpu

gpu

opencl

user3668129

People also ask

1 Answers

BlameTheBits

Recent Activity

Donate For Us