Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA: In which memory space is a fixed size array stored?

Tags:

arrays

cuda

When setting up a fixed size array in a kernel, such as:

int my_array[100];

In which memory space does the array end up?

In particular, I would like to find out if such an array may be stored in the register file or shared memory on >= 2.0 devices and, if so, what the requirements are.

like image 628
Roger Dahl Avatar asked Jun 08 '12 16:06

Roger Dahl


2 Answers

For Fermi (and probably earlier architectures), for an an array to be stored in the register file, the following conditions must be met:

  1. The array is only indexed with constants
  2. There are registers available
  3. Hopefully, the compiler also does some analysis to determine impact on overall performance

The reason for (1) is that register indexes are encoded directly within the SASS instructions. There is no way to address registers indirectly.

The main factors that limits the number of registers for (2) are:

  • The SASS instructions contain only 6 bits for register indexing, which limits the number of registers that can be used in a kernel to 64. The actual number is 63 so one is reserved for something.
  • An SM has a block of registers that are shared by all threads that are concurrently in flight.
  • Registers are also needed for holding variables, so the compiler must balance register usage for best overall performance.

A potential workaround for (1) is loop unrolling. If a loop uses a loop counter as an index into an array, unrolling the loop (with #pragma unroll or manually) causes the array indexes to become constants as there is now a separate SASS instruction for each array access.

Based in part on this NVIDIA presentation: Local Memory and Register Spilling. The document also goes into detail about how the locations of variables and arrays affect performance.

like image 111
Roger Dahl Avatar answered Nov 15 '22 08:11

Roger Dahl


Local arrays within a kernel, as the one you have defined are allocated in registers and in local memory when there are not enough register.

If you want allocate the array in shared memory you must specify it as follow:

__shared__ int my_array[100];
like image 23
pQB Avatar answered Nov 15 '22 08:11

pQB