I have a question about branch predication in GPUs. As far as I know, in GPUs, they do predication with branches.
For example I have a code like this:
if (C)
A
else
B
so if A takes 40 cycles and B takes 50 cycles to finish execution, if assuming for one warp, both A and B are executed, so does it take in total 90 cycles to finish this branch? Or do they overlap A and B, i.e., when some instructions of A are executed, then wait for memory request, then some instructions of B are executed, then wait for memory, and so on? Thanks
February 16, 2021 at 08:21. Coherent in terms of branches on a GPU means that all vector threads take similar path, or a “regular” path. For example if you have 30 / 32 threads take same path of the branch, it's very coherent; or if 16 take it, but the first 16, not like 0 1 1 0 1 …
Using a random or pseudorandom bit (a pure guess) would guarantee every branch a 50% correct prediction rate, which cannot be improved (or worsened) by reordering instructions. (With the simplest static prediction of "assume take", compilers can reorder instructions to get better than 50% correct prediction.)
Techopedia Explains Branch Prediction A CPU using branch prediction only executes statements if a predicate is true. One example is using conditional logic. Since unnecessary code is not executed, the processor can work much more efficiently.
On modern processors it takes between one and twenty CPU cycles. There are at least four categories of control flow instructions: unconditional branch (jmp on x86), call/return, conditional branch (e.g. je on x86) taken and conditional branch not taken.
All of the CUDA capable architectures released so far operate like an SIMD machine. When there is branch divergence within a warp, both code paths are executed by all the threads in the warp, with the threads which are not following the active path executing the functional equivalent of a NOP (I think I recall that there is a conditional execution flag attached to each thread in a warp which allows non executing threads to be masked off).
So in your example, the 90 cycles answer is probably a better approximation of what really happens than the alternative.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With