I have a large program that uses all the registers I allocated per thread (64) and spills to local memory. I would like to be able to tell the compiler which variables should remain in registers at all cost, and which ones I don't really care about. Does the "register" C/C++ keyword work in nvcc? Is there a different mechanism perhaps?
Thanks!
In general all scalar variables defined in CUDA code are stored in registers. Registers are local to a thread, and each thread has exclusive access to its own registers: values in registers cannot be accessed by other threads, even from the same block, and are not available for the host.
When a kernel is called, its execution configuration is provided through <<<...>>> syntax, e.g. cuda_hello<<<1,1>>>() . In CUDA terminology, this is called "kernel launch".
__global__ : 1. A qualifier added to standard C. This alerts the compiler that a function should be compiled to run on a device (GPU) instead of host (CPU). 2.
You can use register in CUDA C/C++ if you want to. In any context, it is only a hint to the compiler. It may be ignored. There is no stated guarantee that it does anything at all.
I think these statements are pretty much true for most language implementations of register.
I also think it's quite likely that the compiler can do a better job than you can of deciding what should be in registers, and appropriate priority.
The typical CUDA C/C++ mechanisms for controlling register usage work at a higher level, they are:
-maxrregcount compile switch
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With