The term thread divergence is used in CUDA; from my understanding it's a situation where different threads are assigned to do different tasks and this results in a big performance hit.
I was wondering, is there a similar penalty for doing this in openmp? For example, say I have a 6 core processor and a program with 6 threads. If I have a conditional that makes 3 threads perform a certain task, and then have the other three threads perform a completely different task, will there be a big performance hit? I guess in essence it's sort of using openmp to do MIMD.
Basically, I'm writing a program with openmp and CUDA. I want two threads to run a CUDA kernel while the other left over threads run C code. Thanks.
No, there is no performance hit for diverging threads using OpenMP. It is a problem in CUDA because of the way instructions are broadcast simultaneously to a set of cores. When an OpenMP thread targets a CPU core, each CPU core has its own independent set of instructions to follow, and it runs just like any other single-threaded program would.
You may see some of your cores being underutilized if you have synchronization barriers following thread divergence, because that would force faster threads to wait for the slower threads to catch up.
When speaking about CPU parallelism, there's no intrinsic performance hit from using a certain threading design pattern. Not at the theoretical level at least.
The only problem I see is that since the threads are doing different things which may have varying completion times, some of the threads may sit idle after finishing their work, waiting for the others to finish a longer task.
The term thread divergence in CUDA refers to the situation when not all threads of a bock evaluate a conditional with the same outcome. Such threads are said to diverge. If diverging threads are in the same warp then such threads may perform work serially which leads to performance loss.
I am not sure that OpenMP has the same issue, though. When different threads perform different work then load balancing may be used by the runtime perhaps, but it doesn't lead to work serialization necessarily.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With