In kernel function, I want two vectors of shared memory, both with size
length (actually sizeof(float)*size
).
Since it is not possible to allocate memory directly in the kernel function if a variable is needed, I had to allocate it dynamically, like:
myKernel<<<numBlocks, numThreads, 2*sizeof(float)*size>>> (...);
and, inside the kernel:
extern __shared__ float row[];
extern __shared__ float results[];
But, this doesn't work.
Instead of this, I made only one vector extern __shared__ float rowresults[]
containing all the data, using the 2*size
memory allocated. So row
calls are still the same, and results
calls are like rowresults[size+previousIndex]
. And this does work.
It is not a big problem because I get my expected results anyway, but is there any way to split my dynamically allocated shared memory into two (or more) different variables? Just for beauty.
The C Programming guide section on __shared__ includes examples where you allocate multiple arrays from dynamically allocated shared memory:
extern __shared__ float array[];
__device__ void func() // __device__ or __global__ function
{
short* array0 = (short*)array;
float* array1 = (float*)&array0[128];
int* array2 = (int*)&array1[64];
}
Since you're just getting a pointer to an element and making that a new array, I believe you could adapt that to use dynamic offsets instead of the static offsets they have in the example. They also note that the alignment has to be the same, which shouldn't be an issue in your case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With