Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the maximum block count possible in CUDA?

Tags:

cuda

Theoretically, you can have 65535 blocks per dimension of the grid, up to 65535 * 65535 * 65535.

If you call a kernel like this:

kernel<<< BLOCKS,THREADS >>>()

(without dim3 objects), what is the maximum number available for BLOCKS?

In an application of mine, I've set it up to 192000 and seemed to work fine... The problem is that the kernel I used changes the contents of a huge array, so although I checked some parts of the array and seemed fine, I can't be sure whether the kernel behaved strangely at other parts.

For the record I have a 2.1 GPU, GTX 500 ti.

like image 975
STE Avatar asked Mar 23 '12 14:03

STE


People also ask

Is there a maximum number of blocks CUDA?

The Guide K. 1. Features and Technical Specifications points out that Maximum number of threads per block and Maximum x- or y-dimension of a block are both 1024. Thus, the maximum value of block_size can be 1024.

How many blocks and threads CUDA?

CUDA architecture limits the numbers of threads per block (1024 threads per block limit). The dimension of the thread block is accessible within the kernel through the built-in blockDim variable.

What is the maximum number of simultaneous blocks that will run on a single SM?

Each SM has a limited number of registers and a limited amount of local memory. For example, no more than 16 thread blocks can run simultaneously on a single SM with the Kepler microarchitecture.

What is the maximum number threads can be launched on the GPU?

Each CUDA card has a maximum number of threads in a block (512, 1024, or 2048). Each thread also has a thread id: threadId = x + y Dx + z Dx Dy The threadId is like 1D representation of an array in memory. If you are working with 1D vectors, then Dy and Dz could be zero.


1 Answers

With compute capability 3.0 or higher, you can have up to 2^31 - 1 blocks in the x-dimension, and at most 65535 blocks in the y and z dimensions. See Table H.1. Feature Support per Compute Capability of the CUDA C Programming Guide Version 9.1.

As Pavan pointed out, if you do not provide a dim3 for grid configuration, you will only use the x-dimension, hence the per dimension limit applies here.

like image 112
perreal Avatar answered Oct 21 '22 20:10

perreal