When setting up a fixed size array in a kernel, such as:
int my_array[100];
In which memory space does the array end up?
In particular, I would like to find out if such an array may be stored in the register file or shared memory on >= 2.0 devices and, if so, what the requirements are.
For Fermi (and probably earlier architectures), for an an array to be stored in the register file, the following conditions must be met:
The reason for (1) is that register indexes are encoded directly within the SASS instructions. There is no way to address registers indirectly.
The main factors that limits the number of registers for (2) are:
A potential workaround for (1) is loop unrolling. If a loop uses a loop counter as an index into an array, unrolling the loop (with #pragma unroll
or manually) causes the array indexes to become constants as there is now a separate SASS instruction for each array access.
Based in part on this NVIDIA presentation: Local Memory and Register Spilling. The document also goes into detail about how the locations of variables and arrays affect performance.
Local arrays within a kernel, as the one you have defined are allocated in registers and in local memory when there are not enough register.
If you want allocate the array in shared memory you must specify it as follow:
__shared__ int my_array[100];
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With