The occupancy is defined to be the number of active warps over the number of max warps supported on one Stream Multiprocessor. Let us say I have 4 blocks running on one SM, each block has 320 threads, i.e., 10 warps, so 40 warps on one SM. The Occupancy is 40/48, assuming max warps on one SM is 48 (CC 2.x).
But in total I have 320 * 4 threads running on one SM, and there are only 48 CUDA cores on one SM. Why the occupancy is not 100%? I am using all CUDA cores...
I am pretty sure I am missing something...
The CUDA Occupancy Calculator allows you to compute the multiprocessor occupancy of a GPU by a given CUDA kernel. The multiprocessor occupancy is the ratio of active warps to the maximum number of warps supported on a multiprocessor of the GPU.
Definition of Occupancy Occupancy is defined as the ratio of active warps on an SM to the maximum number of active warps supported by the SM.
How many warps will be generated during the kernel execution? Explanation: There are ceil(800/16.0) = 50 blocks in the x direction and ceil(600/16.0) = 38 blocks in the y direction. Each block contributes (16*16)/32=8 warps.
Active Warps A warp is active from the time it is scheduled on a multiprocessor until it completes the last instruction. Each warp scheduler maintains its own list of assigned active warps.
Because occupancy has nothing to do with cores. CUDA is a pipelined SIMD style architecture. Your 48 cores are fed per warp instructions from a pipeline (dual issued, in fact). You need a lot of warps to keep the instruction pipeline full, otherwise all the cores will stall. That is why occupancy is a somewhat useful metric for quantifying the ability of a given kernel to supply enough parallel work to achieve reasonable performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With