Why CUDA Occupancy is defined in terms of number of active warps over max warps supported

Tags:

cuda

The occupancy is defined to be the number of active warps over the number of max warps supported on one Stream Multiprocessor. Let us say I have 4 blocks running on one SM, each block has 320 threads, i.e., 10 warps, so 40 warps on one SM. The Occupancy is 40/48, assuming max warps on one SM is 48 (CC 2.x).

But in total I have 320 * 4 threads running on one SM, and there are only 48 CUDA cores on one SM. Why the occupancy is not 100%? I am using all CUDA cores...

I am pretty sure I am missing something...

517

asked Mar 06 '13 20:03

szli

1 Answers

Because occupancy has nothing to do with cores. CUDA is a pipelined SIMD style architecture. Your 48 cores are fed per warp instructions from a pipeline (dual issued, in fact). You need a lot of warps to keep the instruction pipeline full, otherwise all the cores will stall. That is why occupancy is a somewhat useful metric for quantifying the ability of a given kernel to supply enough parallel work to achieve reasonable performance.

184

answered Oct 18 '22 01:10

talonmies

Related questions
                            
                                CUDA Thrust slow when operating large vectors on my machine
                            
                                Analyzing memory access coalescing of my CUDA kernel
                            
                                How to prevent two CUDA programs from interfering
                            
                                conditional syncthreads & deadlock (or not)
                            
                                Error : argument of type "int" is incompatible with parameter of type "const void *"
                            
                                What's the efficient way to swap two register variables in CUDA?
                            
                                How can I check the progress of matrix multiplication?
                            
                                CMake does not properly find CUDA library
                            
                                Using multiple CUDA GPUs
                            
                                How to define CUDA device constant like a C++ const/constexpr?
                            
                                ERROR: The Compose file './docker-compose.yaml' is invalid because: Unsupported config option for services.nvidia-smi-test: 'runtime'
                            
                                Is it possible to enable syntax highlighting for CUDA 4.0 in Visual Studio 2010?
                            
                                Programming CUDA using Delphi or FreePascal
                            
                                Issue with production release of CUDA Toolkit 4.0 and Nsight 2.0
                            
                                Cuda Bayer/CFA demosaicing example
                            
                                initializer not allowed for __shared__ variable for cuda
                            
                                /usr/bin/ld: cannot find -lcudart
                            
                                L1 cache persistance across CUDA kernels
                            
                                CUDA-capability and CUDA version: compatible?
                            
                                How to get properties from active CUDA device?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With