I am reading this book to study the concepts of CUDA in depth. In one of the chapters, which introduces the concept of SIMT it says
The option for control flow divergence in SIMT also simplifies the requirement for programmers to use extra instructions to handle control flow compared to SSE.
I know this statement is made based on the fact that SSE works on SIMD implementation technique and CUDA threads work on the principle of SIMT, but can anyone elaborate/explain on this sentence using some example. Thanks in advance.
With SIMD if you have a routine where some elements need to be handled differently from other elements, then you need to explicltly take care of masking operations so that they are only applied to the correct elements. With CUDA's SIMT architecture you get the illusion of control flow on each thread, so you don't need explicit masking of operations - this still happens "under the hood" of course, but the burden is lifted from the programmer.
Example: suppose you want to set all negative elements to zero. In CUDA:
if (X[tid] < 0)
X[tid] = 0; // NB: CUDA core steps through this instruction but only executes
// it if the preceding condition was true
In SIMD (SSE):
__m128 mask = _mm_cmpge_ps(X, _mm_set1_ps(0)); // generate mask for all elements >= 0
X = _mm_and_ps(X, mask); // clear all elements which are < 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With