Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to define a CUDA shared memory with a size known at run time?

The __shared__ memory in CUDA seems to require a known size at compile time. However, in my problem, the __shared__ memory size is only know at run time, i.e.

int size=get_size();
__shared__ mem[size];

This will end up with "error: constant value is not known", and I'm not sure how to get around this problem.

like image 570
Hailiang Zhang Avatar asked Mar 30 '12 02:03

Hailiang Zhang


People also ask

How does CUDA define shared memory?

Declare shared memory in CUDA C/C++ device code using the __shared__ variable declaration specifier. There are multiple ways to declare shared memory inside a kernel, depending on whether the amount of memory is known at compile time or at run time.

How do I allocate device memory in CUDA?

Memory management on a CUDA device is similar to how it is done in CPU programming. You need to allocate memory space on the host, transfer the data to the device using the built-in API, retrieve the data (transfer the data back to the host), and finally free the allocated memory.

What memory system is used in CUDA?

CUDA also uses an abstract memory type called local memory. Local memory is not a separate memory system per se but rather a memory location used to hold spilled registers. Register spilling occurs when a thread block requires more register storage than is available on an SM.


1 Answers

The purpose of shared memory is to allow the threads in a block to collaborate. When you declare an array as __shared__, each thread in the block sees the same memory, so it would not make sense for a given thread to be able to set its own size for an array in shared memory.

However, the special case of dynamically specifying the size of a single __shared__ array that is the same size for all threads IS supported. See allocating shared memory.

If you do need to dynamically allocate memory for each thread, you can use new or malloc inside a kernel (on Fermi), but they allocate global memory, which is likely to be slow.

like image 135
Roger Dahl Avatar answered Oct 16 '22 03:10

Roger Dahl