Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA Thread Addressing ((threadIdx.x, threadIdx.y, threadIdx.z) and block addressing (blockidx.x, blockidx.y)

Tags:

cuda

I just need to clarify something very basic - with most of the computational examples using something like:

ID = blockIdx.x*blockDim.x+threadIdx.x;

// ... then do computation on array[ID]

My question is that if I want to use the maximum number of thread in a block (1024) then do I really need to 'construct' my 'threadID' with consideration of all of (threadIdx.x, threadIdx.y, threadIdx.z) ?

If so, what is a recommended way to hash it into a single value?

If not so, why can someone using it in a similar fashion in image-processing related operations such as in this post:

https://stackoverflow.com/questions/11503406/cuda-addressing-a-matrix

How about blockidx.x and blockidx.y, are they in the same shoes as the threaIdx in this regard?

like image 291
Stochastika Avatar asked May 06 '13 17:05

Stochastika


People also ask

What is blockIdx X in Cuda?

x variable is similar to the thread index except it refers to the number associated with the block. Let's say you want 2 blocks in a 1D grid with 5 threads in each block. Your threadIdx. x would be 0, 1,…,4 for each block and your blockIdx. x would be 0 and 1 depending on the specific block.

What is blockDim in Cuda?

numba.cuda.blockDim. The shape of the block of threads, as declared when instantiating the kernel. This value is the same for all threads in a given kernel, even if they belong to different blocks (i.e. each block is “full”). numba.cuda.blockIdx. The block indices in the grid of threads launched a kernel.

What is dim3 grid dim indicates?

• dim3 gridDim; – Dimensions of the grid in blocks (gridDim.z unused) • dim3 blockDim; – Dimensions of the block in threads.


2 Answers

Creating 2D or 3D threadblocks is usually done because the problem lends itself to a 2D or 3D interpretation of the data, and handling it using a 2D or 3D threadblock may make the code more readable. But there's no specific reason why it cannot be done with a 1D threadblock with appropriate indexing.

Creating a 2D or 3D grid (of blocks) is usually done for the reason described above and/or to get around the limitation on pre CC 3.0 devices of the number of blocks in any one dimension of a grid (65535 max blocks in any dimension).

For the threadblock case, you can use 1024 threads in a single block in a single dimension, so you don't need to construct your ID variable with threadIdx.y or threadIdx.z if you don't want to.

If you have a pre CC 3.0 device, and your problem is large enough in terms of blocks, you may still want to construct a 2D grid. You can still use 1D threadblocks in that grid. In that case, a unique ID variable can be created like:

 int idx = threadIdx.x + (((gridDim.x * blockIdx.y) + blockIdx.x)*blockDim.x);  

The above construct should handle 1D threadblocks with any 2D grid.

There are other methods besides constructing a 2D grid to work with large problem sizes, such as having your blocks handle multiple chunks of data in a loop of some sort.

like image 92
Robert Crovella Avatar answered Sep 29 '22 08:09

Robert Crovella


This is on top of Robert Crovella's answer:

Creating 2D/3D grid is not only just for readability, but also for exploiting 2D/3D locality in the on-chip shared memory, which provides much faster accesses. You can not exploit such locality efficiently with 1D grid, if your problem works on a 2D grid.

like image 31
user2030440 Avatar answered Sep 29 '22 07:09

user2030440