I am fairly certain that a warp is only defined in CUDA. But maybe I'm wrong. What is a warp in terms of OpenCL?
It's not the same as work group, is it? Any relevant feedback is highly appreciated. Thanks!
In an NVIDIA GPU, the basic unit of execution is the warp. A warp is a collection of threads, 32 in current implementations, that are executed simultaneously by an SM. Multiple warps can be executed on an SM at once.
This point reduces the scope of GPU-enabled applications. In the SIMT paradigm, threads are automatically grouped into 32-wide bundles called warps. Warps are the base unit used to schedule both computation on Arithmetic and Logic Units (ALUs) and memory accesses.
Warp divergence occurs when two threads of the same warp diverge in their execution due to a branch instruction, where one thread branches and the other does not. This leads to serialization of the two threads by the CUDA hardware until their execution path converges again.
If we use the maximum number of registers per thread (minimizing that way the number of Global memory accesses), the maximum number of threads running per SM simultaneously is 512 (32768 registers/64 registers per thread = 512 threads per SM or 16 warps per SM).
It isn't defined in the OpenCL standard. A warp is a thread as executed by the hardware (CUDA threads are not really threads and map onto a warp as separate SIMD elements with some clever hardware/software mapping). It is a collection of work-items and there can be multiple warps in a work-group.
An OpenCL subgroup was designed to be compatible with a hardware thread, and hence is able to represent a warp in the OpenCL kernel, but it is entirely up to NVIDIA to decide to implement subgroups or not and of course an OpenCL subgroup cannot expose every feature that NVIDIA can expose for warps because it is a standard, while NVIDIA can do anything they like on their own devices.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With