Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cuda, executional thread order in a 3d-block

Tags:

cuda

As title, I would like to know the right execution order in case we have a 3d block

I think to remember that I read already something regarding it, but it was some time ago, I dont remember where but it was coming by someone who didnt look so reliable..

Anyway I would like to have some confirmations about it.

Is it as the following (divided in warps)?

[0, 0, 0]...[blockDim.x, 0, 0] - [0, 1, 0]...[blockDim.x, 1, 0] - (...) - [0, blockDim.y, 0]...[blockDim.x, blockDim.y, 0] - [0, 0, 1]...[blockDim.x, 0, 1] - (...) - [0, blockDim.y, 1]...[blockDim.x, blockDim.y, 1] - (...) - [blockDim.x, blockDim.y, blockDim.z]

like image 259
elect Avatar asked Jul 16 '12 13:07

elect


People also ask

How many threads are in a block CUDA?

CUDA architecture limits the numbers of threads per block (1024 threads per block limit). The dimension of the thread block is accessible within the kernel through the built-in blockDim variable.

How many threads is a warp?

A warp is a set of 32 threads within a thread block such that all the threads in a warp execute the same instruction. These threads are selected serially by the SM. Once a thread block is launched on a multiprocessor (SM), all of its warps are resident until their execution finishes.

What is blockDim in CUDA?

numba.cuda.blockDim. The shape of the block of threads, as declared when instantiating the kernel. This value is the same for all threads in a given kernel, even if they belong to different blocks (i.e. each block is “full”). numba.cuda.blockIdx. The block indices in the grid of threads launched a kernel.

What is a thread indexing?

By thread indexing we are getting a unique number for each thread and each block in a grid. 1D grid of 1D blocks. Indices given in RED color are the unique numbers for each block and each thread. threadId = (blockIdx.x * blockDim.x) + threadIdx.x. Let's check the equation for Thread (2,0) in Block (1,0).


1 Answers

Yes, that is the correct ordering; threads are ordered with the x dimension varying first, then y, then z (equivalent to column-major order) within a block. The calculation can be expressed as

int threadID = threadIdx.x + 
               blockDim.x * threadIdx.y + 
               (blockDim.x * blockDim.y) * threadIdx.z;

int warpID = threadID / warpSize;
int laneID = threadID % warpsize;

Here threadID is the thread number within the block, warpID is the warp within the block and laneID is the thread number within the warp.

Note that threads are not necessarily executed in any sort of predicable order related to this ordering within a block. The execution model guarantees that threads in the same warp are executed "lock-step", but you can't infer any more than that from the thread numbering within a block.

like image 77
talonmies Avatar answered Oct 09 '22 06:10

talonmies