Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why an “if-else” statement (in GPUs code) will cut the performance in half

Tags:

gpgpu

gpu

opencl

I read this article:

FPGA or GPU? - The evolution continues

And someone added a comment in which he wrote:

Since GPUs are SIMD any code with an “if-else” statement will cut your performance in half. Half of the cores will execute the if part of the statement while half of the cores are in idle and then the other half cores will do the else calculations while the first half of the cores remain idle.

I can't understand why ?

Why using GPU (i.e OpenCL) when using if-else the performance will cut in a half ?

like image 546
user3668129 Avatar asked Aug 17 '17 11:08

user3668129


People also ask

What is the purpose of the If-Then-Else statement?

The if-then-else statement provides a secondary path of execution when an "if" clause evaluates to false . You could use an if-then-else statement in the applyBrakes method to take some action if the brakes are applied when the bicycle is not in motion.

Which is similar to if/then else?

An alternative to IF-THEN-ELSE I use often is the use of logical expressions. A logical expression is specified within parentheses '()' and are evaluated as being true or false. If the expression is true, a 1 is returned. If the expression is false, a 0 is returned.


1 Answers

Branches in general do not affect performance but branch divergence does. That is, two threads taking different paths (e.g. one fulfills the if condition, the other does not). Because all threads of a GPU execute the same "line of code" some threads have to wait while the code which is not part of their path is executed.
Well, that is not really true as only all threads in one warp (NVIDIA) or wavefront (AMD) execute the same "line of code". (Currently, the warp size of NVIDIA GPUs is 32 and the wafefront size of AMD GPUs is 64.)

So if there is an if-else block in your kernel the worst case scenario is indeed a 50% performance drop. And even worse: If there are n possible branches the performance can decrease down to 1/n of the performance without divergence (that is no branches or all threads in a warp/ wafefront are taking the same path). Of course for such scenarios your whole kernel must be embedded in an if-else (or switch) construct.

But as written above this will only happen if the threads which are taking different paths are in the same warp/wafefront. So it is up to you to write your code/ rearrange data/ chose the algorithm/ ... to avoid branch divergence as far as possible.

Tl;DR: There can be branches but if different threads are taking different branches they have to be in different warps/ wafefronts to avoid divergence and thus performance loss.

like image 168
BlameTheBits Avatar answered Dec 26 '22 15:12

BlameTheBits