What is the maximum block count possible in CUDA?

Tags:

cuda

Theoretically, you can have 65535 blocks per dimension of the grid, up to 65535 * 65535 * 65535.

If you call a kernel like this:

kernel<<< BLOCKS,THREADS >>>()

(without dim3 objects), what is the maximum number available for BLOCKS?

In an application of mine, I've set it up to 192000 and seemed to work fine... The problem is that the kernel I used changes the contents of a huge array, so although I checked some parts of the array and seemed fine, I can't be sure whether the kernel behaved strangely at other parts.

For the record I have a 2.1 GPU, GTX 500 ti.

975

asked Mar 23 '12 14:03

STE

1 Answers

With compute capability 3.0 or higher, you can have up to 2^31 - 1 blocks in the x-dimension, and at most 65535 blocks in the y and z dimensions. See Table H.1. Feature Support per Compute Capability of the CUDA C Programming Guide Version 9.1.

As Pavan pointed out, if you do not provide a dim3 for grid configuration, you will only use the x-dimension, hence the per dimension limit applies here.

112

answered Oct 21 '22 20:10

perreal

Related questions
                            
                                Is there a fast in-memory queue I can use that swaps items as it reaches a certain size?
                            
                                What size are integers when programming cuda kernels
                            
                                When will OpenCL 1.2 for NVIDIA hardware be available?
                            
                                CUDA/OpenGL interop, draw to OpenGL texture with CUDA
                            
                                What is the difference and relation among 'cuda' 'cudnn' 'cunn' and 'cutorch' in torch?
                            
                                Can C++17 be used together with CUDA using clang?
                            
                                CUDA compiler (nvcc) macro
                            
                                CUDA: How to check for the right compute capability?
                            
                                Valgrind and CUDA: Are reported leaks real?
                            
                                When to use volatile with shared CUDA Memory
                            
                                Techniques to Reduce CPU to GPU Data Transfer Latency
                            
                                Difference between kernels construct and parallel construct
                            
                                Reduce matrix rows with CUDA
                            
                                Finding the maximum element value AND its position using CUDA Thrust
                            
                                How do nVIDIA CC 2.1 GPU warp schedulers issue 2 instructions at a time for a warp?
                            
                                Is local memory slower than shared memory in CUDA?
                            
                                Confusion on CUDA/openCL and C++ AMP
                            
                                Ubuntu 14.04 how to install cuda 6.5 without installing nvidia driver
                            
                                Best practice for upgrading CUDA and cuDNN for tensorflow
                            
                                Forcing CUDA to use register for a variable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With