I'm wondering how can I exit from a thread, whose thread index is to big. I see two possibilities:
int i = threadIdx.x;
if(i >= count)
    return;
// do logic
or
int i = threadIdx.x;
if(i < count) {
    // do logic
}
I know, that both are correct, but which one affect more the performance?
Warp divergence occurs when two threads of the same warp diverge in their execution due to a branch instruction, where one thread branches and the other does not. This leads to serialization of the two threads by the CUDA hardware until their execution path converges again.
NVIDIA GPUs execute warps of 32 parallel threads using SIMT, which enables each thread to access its own registers, to load and store from divergent addresses, and to follow divergent control flow paths.
Each CUDA card has a maximum number of threads in a block (512, 1024, or 2048). Each thread also has a thread id: threadId = x + y Dx + z Dx Dy The threadId is like 1D representation of an array in memory. If you are working with 1D vectors, then Dy and Dz could be zero. Then threadIdx is x, and threadId is x.
Although both are the same in terms of performance, you should take into account that the first one is not recommended.
Return a thread within a kernel could cause an unexpected behaviour in the rest of your code.
By unexpected behaviour I mean whatever problem related to the minimum unit of threads that are grouped in a warp. In example if you have an if / else block in your kernel, this situation is known as thread divergence and in a normal case it results in threads remaining idle and others executing some instructions.
CUDA by Example Book, Chapter 5, Thread Cooperation:
But in the case of __syncthreads(), the result is somewhat tragic. The CUDA Architecture guarantees that no thread will advance to an instruction beyond the __syncthreads() until every thread in the block has executed the __syncthreads()
So, it is mainly related to the threads synchronization within a kernel. You can find a very good question / answer about this topic here: Can I use __syncthreads() after having dropped threads?
As I final note, I've also used that bad practice and no problem appeared but there is no guarantee that problems may arise in the future. It is something that I would not recommend
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With