I am aware of the dynamic allocation when 1D arrays are used, but how can it be done when 2D arrays are used?
myKernel<<<blocks, threads,sizeofSharedMemoryinBytes>>>();
....
__global__ void myKernerl(){
__shared__ float sData[][];
.....
}
Say I want to allocate a 2D shared memory array:
__shared__ float sData[32][32];
How can it be done dynamically? would be:
myKernel<<< blocks, threads, sizeof(float)*32*32 >>>();
Dynamic Memory Allocation on CPU/GPU. The shared memory of the GPU consists of typically 32K, that has to be shared between all threads in one block. For single-precision floating point vectors vec or matrices mat and for 1024 threads per block, the maximum amount of shared memory is 32K/ (1024*4) = 8 elements.
When the amount of memory that an individual thread uses is too large to fit in the shared memory or in the registers. The shared memory of the GPU consists of typically 32K, that has to be shared between all threads in one block.
There are two ways of which we can allocate shared memory: dynamic and static. If we know the amount of required shared memory at compile time, we can use static shared memory. The syntax for allocating static shared memory is this and always has to be allocated inside a kernel:
The shared memory of the GPU consists of typically 32K, that has to be shared between all threads in one block. For single-precision floating point vectors vec or matrices mat and for 1024 threads per block, the maximum amount of shared memory is 32K/ (1024*4) = 8 elements.
As you have correctly written you have to specify size of dynamically allocated shared memory before each kernel calling in configuration of execution (in <<<blocks, threads, sizeofSharedMemoryinBytes>>>
). This specifies the number of bytes in shared memory that
is dynamically allocated per block for this call in addition to the statically allocated memory. IMHO there is no way to access such memory as 2D array, you have to use 1D array and use it like 2D. Last think, don't forget qualifier extern
. So your code should look like this:
sizeofSharedMemoryinBytes = dimX * dimY * sizeof(float);
myKernel<<<blocks, threads,sizeofSharedMemoryinBytes>>>();
....
__global__ void myKernerl() {
extern __shared__ float sData[];
.....
sData[dimX * y + x] = ...
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With