I read this article:
FPGA or GPU? - The evolution continues
And someone added a comment in which he wrote:
Since GPUs are SIMD any code with an “if-else” statement will cut your performance in half. Half of the cores will execute the if part of the statement while half of the cores are in idle and then the other half cores will do the else calculations while the first half of the cores remain idle.
I can't understand why ?
Why using GPU (i.e OpenCL) when using if-else
the performance will cut in a half ?
The if-then-else statement provides a secondary path of execution when an "if" clause evaluates to false . You could use an if-then-else statement in the applyBrakes method to take some action if the brakes are applied when the bicycle is not in motion.
An alternative to IF-THEN-ELSE I use often is the use of logical expressions. A logical expression is specified within parentheses '()' and are evaluated as being true or false. If the expression is true, a 1 is returned. If the expression is false, a 0 is returned.
Branches in general do not affect performance but branch divergence does. That is, two threads taking different paths (e.g. one fulfills the if
condition, the other does not). Because all threads of a GPU execute the same "line of code" some threads have to wait while the code which is not part of their path is executed.
Well, that is not really true as only all threads in one warp (NVIDIA) or wavefront (AMD) execute the same "line of code". (Currently, the warp size of NVIDIA GPUs is 32 and the wafefront size of AMD GPUs is 64.)
So if there is an if-else
block in your kernel the worst case scenario is indeed a 50% performance drop. And even worse: If there are n
possible branches the performance can decrease down to 1/n
of the performance without divergence (that is no branches or all threads in a warp/ wafefront are taking the same path). Of course for such scenarios your whole kernel must be embedded in an if-else
(or switch
) construct.
But as written above this will only happen if the threads which are taking different paths are in the same warp/wafefront. So it is up to you to write your code/ rearrange data/ chose the algorithm/ ... to avoid branch divergence as far as possible.
Tl;DR: There can be branches but if different threads are taking different branches they have to be in different warps/ wafefronts to avoid divergence and thus performance loss.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With