What is the maximum number of blocks in a grid that can created per kernel launch? I am slightly confused here since
Now the compute capability table here says that there can be 65535 blocks per grid dimemsion in CUDA compute capability 2.0.
Does that mean the total number of blocks = 65535*65535?
Or does it mean that you can rearrange at most 65535 into a 1d grid of 65536 blocks or 2d grid of sqrt(65535) * sqrt(65535) ?
Thank you.
Theoretically, you can have 65535 blocks per dimension of the grid, up to 65535 * 65535 * 65535.
Each SM can have upto 16 active blocks on Kepler and 8 active blocks on Fermi. Also you need to think in terms of warps.
Hardwire limits the number of blocks in a single launch to 65,535. Hardwire also limits the number of threads per block with which we can launch a kernel. – For many GPUs, maxThreadsPerBlock = 512 (or 1024, version 2.
Each CUDA card has a maximum number of threads in a block (512, 1024, or 2048).
65535 per dimension of the grid. On compute 1.x cards, 1D and 2D grids are supported. On compute 2.x cards, 3D grids are also supported, so 65535, 65535 x 65535, and 65535 x 65535 x 65535 are the limits for Fermi (compute 2.x) cards.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With