Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why CUDA Occupancy is defined in terms of number of active warps over max warps supported

Tags:

cuda

The occupancy is defined to be the number of active warps over the number of max warps supported on one Stream Multiprocessor. Let us say I have 4 blocks running on one SM, each block has 320 threads, i.e., 10 warps, so 40 warps on one SM. The Occupancy is 40/48, assuming max warps on one SM is 48 (CC 2.x).

But in total I have 320 * 4 threads running on one SM, and there are only 48 CUDA cores on one SM. Why the occupancy is not 100%? I am using all CUDA cores...

I am pretty sure I am missing something...

like image 517
szli Avatar asked Mar 06 '13 20:03

szli


People also ask

What is Cuda occupancy?

The CUDA Occupancy Calculator allows you to compute the multiprocessor occupancy of a GPU by a given CUDA kernel. The multiprocessor occupancy is the ratio of active warps to the maximum number of warps supported on a multiprocessor of the GPU.

Can you describe what means occupancy of a kernel?

Definition of Occupancy Occupancy is defined as the ratio of active warps on an SM to the maximum number of active warps supported by the SM.

How many warps are created when the kernel is launched?

How many warps will be generated during the kernel execution? Explanation: There are ceil(800/16.0) = 50 blocks in the x direction and ceil(600/16.0) = 38 blocks in the y direction. Each block contributes (16*16)/32=8 warps.

What is active warp?

Active Warps A warp is active from the time it is scheduled on a multiprocessor until it completes the last instruction. Each warp scheduler maintains its own list of assigned active warps.


1 Answers

Because occupancy has nothing to do with cores. CUDA is a pipelined SIMD style architecture. Your 48 cores are fed per warp instructions from a pipeline (dual issued, in fact). You need a lot of warps to keep the instruction pipeline full, otherwise all the cores will stall. That is why occupancy is a somewhat useful metric for quantifying the ability of a given kernel to supply enough parallel work to achieve reasonable performance.

like image 184
talonmies Avatar answered Oct 18 '22 01:10

talonmies