Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GPU 2D shared memory dynamic allocation

I am aware of the dynamic allocation when 1D arrays are used, but how can it be done when 2D arrays are used?

myKernel<<<blocks, threads,sizeofSharedMemoryinBytes>>>();
         ....

__global__ void myKernerl(){
 __shared__ float sData[][];
     .....
}

Say I want to allocate a 2D shared memory array:

__shared__ float sData[32][32];

How can it be done dynamically? would be:

myKernel<<< blocks, threads, sizeof(float)*32*32 >>>();
like image 759
Manolete Avatar asked Nov 02 '12 13:11

Manolete


People also ask

What is dynamic memory allocation in GPU?

Dynamic Memory Allocation on CPU/GPU. The shared memory of the GPU consists of typically 32K, that has to be shared between all threads in one block. For single-precision floating point vectors vec or matrices mat and for 1024 threads per block, the maximum amount of shared memory is 32K/ (1024*4) = 8 elements.

When does a GPU have to share memory between threads?

When the amount of memory that an individual thread uses is too large to fit in the shared memory or in the registers. The shared memory of the GPU consists of typically 32K, that has to be shared between all threads in one block.

How to allocate shared memory in Linux?

There are two ways of which we can allocate shared memory: dynamic and static. If we know the amount of required shared memory at compile time, we can use static shared memory. The syntax for allocating static shared memory is this and always has to be allocated inside a kernel:

What is the maximum amount of shared memory in a GPU?

The shared memory of the GPU consists of typically 32K, that has to be shared between all threads in one block. For single-precision floating point vectors vec or matrices mat and for 1024 threads per block, the maximum amount of shared memory is 32K/ (1024*4) = 8 elements.


1 Answers

As you have correctly written you have to specify size of dynamically allocated shared memory before each kernel calling in configuration of execution (in <<<blocks, threads, sizeofSharedMemoryinBytes>>>). This specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory. IMHO there is no way to access such memory as 2D array, you have to use 1D array and use it like 2D. Last think, don't forget qualifier extern. So your code should look like this:

   sizeofSharedMemoryinBytes = dimX * dimY * sizeof(float);

   myKernel<<<blocks, threads,sizeofSharedMemoryinBytes>>>();
     ....

   __global__ void myKernerl() {

       extern __shared__ float sData[];
       .....
       sData[dimX * y + x] = ...
   }
like image 135
stuhlo Avatar answered Nov 11 '22 12:11

stuhlo