As in title, in cuda programs, where does the kernel parameter resides after kernel launch, in local memory or global memory of GPU?
For example, in LLVM IR of a cuda program:
__global__ kernel(int param1):
%0 = alloca int
store param1, %0
So, in this case, where does %0 point to? local memory or global memory?
Also, I saw sometimes kernel parameters are held and use directly in registers instead of storing it in any memory. How this decision is made?
As Robert Corvella in his comment pointed out: parameters are stored in contant memory of the GPU.
However,
doing an alloca and a store of param1 to the allocated space moves copies the parameter from the constant memory to local memory.
alloca instructions are lowerd to stack allocations in PTX code.
In clang, this is the canonical way to handle function parameters during code generation. However, on GPUs this can (since PTX is optimized during lowering to SASS just saying: can) lead to a performance penalty because local memory goes through all cache levels down to global memory and is much slower than constant memory.
In LLVM you have the mem2reg optimizer pass. This pass promotes all memory allocations on the stack to registers.
In the case of kernel parameters you most likely want this optimization. The alloca and the store instructions disappear form your IR and the parameter will be used directly instead of an unnecessary copy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With