Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

3D image indices

Tags:

cuda

I have an image of size 512 x 512 x 512. I need to process all the voxels individually. How can I get the thread id to do this? If I use 1D thread ID the number of blocks will exceeds 65536.

    int id = blockIdx.x*blockDim.x + threadIdx.x;

Note :- My card doesnt support for the 3D grids

like image 760
user570593 Avatar asked Mar 21 '26 02:03

user570593


1 Answers

You are able to use 3D indicies in CUDA 4.0 and compute capability 2.0+. Example code:

int blocksInX = (nx+8-1)/8;
int blocksInY = (ny+8-1)/8;
int blocksInZ = (nz+8-1)/8;

dim3 Dg(blocksInX, blocksInY, blocksInZ);
dim3 Db(8, 8, 8);
foo_kernel<<Dg, Db>>(R, nx, ny, nz);

...

__global__ void foo_kernel( float* R, const int nx, const int ny, const int nz )
{
  unsigned int xIndex = blockDim.x * blockIdx.x + threadIdx.x;
  unsigned int yIndex = blockDim.y * blockIdx.y + threadIdx.y;
  unsigned int zIndex = blockDim.z * blockIdx.z + threadIdx.z;

  if ( (xIndex < nx) && (yIndex < ny) && (zIndex < nz) )
  {
    unsigned int index_out = xIndex + nx*yIndex + nx*ny*zIndex;
    ...
    R[index_out] = ...;
  }
}

If your device doesn't support compute capability 2.0, there is some trick:

int threadsInX = 16;
int threadsInY = 4;
int threadsInZ = 4;

int blocksInX = (nx+threadsInX-1)/threadsInX;
int blocksInY = (ny+threadsInY-1)/threadsInY;
int blocksInZ = (nz+threadsInZ-1)/threadsInZ;

dim3 Dg = dim3(blocksInX, blocksInY*blocksInZ);
dim3 Db = dim3(threadsInX, threadsInY, threadsInZ);

foo_kernel<<<Dg, Db>>>(R, nx, ny, nz, blocksInY, 1.0f/(float)blocksInY);

__global__ void foo_kernel(float *R, const int nx, const int ny, const int nz,
                           unsigned int blocksInY, float invBlocksInY)
{

    unsigned int blockIdxz = __float2uint_rd(blockIdx.y * invBlocksInY);
    unsigned int blockIdxy = blockIdx.y - __umul24(blockIdxz, blocksInY);
    unsigned int xIndex = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;
    unsigned int yIndex = __umul24(blockIdxy, blockDim.y) + threadIdx.y;
    unsigned int zIndex = __umul24(blockIdxz, blockDim.z) + threadIdx.z;

    if ( (xIndex < nx) && (yIndex < xIndex) && (zIndex < nz) )
    {
        unsigned int index = xIndex + nx*yIndex + nx*ny*zIndex;
        ...
        R[index] = ...;
    }

}
like image 142
MeinLieberTanz Avatar answered Mar 23 '26 15:03

MeinLieberTanz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!