Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kernel parameter passing in CUDA?

Tags:

c++

c

memory

cuda

I have a newbie doubt regarding how do CUDA kernels work.

If have the following code (which use the function cuPrintf taken from here):

#include "cuPrintf.cu"

__global__ void testKernel(int param){
    cuPrintf("Param value: %d\n", param);
}

int main(void){

    // initialize cuPrintf
    cudaPrintfInit();

    int a = 456;    

    testKernel<<<4,1>>>(a);

    // display the device's greeting
    cudaPrintfDisplay();

    // clean up after cuPrintf
    cudaPrintfEnd();
}

The output of the execution is:

Param value: 456
Param value: 456
Param value: 456
Param value: 456

I cannot get how the kernel can read the correct value of the parameter I pass, isn't it allocated in the host memory? Can the GPU read from the host memory?

Thanks,

Andrea

like image 518
Andrea Avatar asked Jun 27 '11 21:06

Andrea


People also ask

What is kernel function in Cuda?

The kernel is a function executed on the GPU. Every CUDA kernel starts with a __global__ declaration specifier. Programmers provide a unique global ID to each thread by using built-in variables. Figure 2. CUDA kernels are subdivided into blocks.

Can a CUDA kernel call another kernel?

Dynamic Parallelism in CUDA 5.0 enables a CUDA kernel to create and synchronize new nested work, using the CUDA runtime API to launch other kernels, optionally synchronize on kernel completion, perform device memory management, and create and use streams and events, all without CPU involvement.

What are the limitations of CUDA kernel?

kernel cannot allocate, and only isbits types in device arrays: CUDA C has no garbage collection, and Julia has no manual deallocations, let alone on the device to deal with data that live independently of the CuArray. no try-catch-finally in kernel: CUDA C does not support exception handling on device (v11.


2 Answers

According to the section E.2.5.2. Function Parameters in CUDA C Programming Guide

__global__ function parameters are passed to the device:

  • via shared memory and are limited to 256 bytes on devices of compute capability 1.x,
  • via constant memory and are limited to 4 KB on devices of compute capability 2.x and higher.
like image 63
phil Avatar answered Sep 20 '22 17:09

phil


The declaration void testKernel(int param) says that param is passed by value, not by reference. In other words, the stack contains a copy of a's value, not a pointer to a. CUDA copies the stack to the kernel running on the GPU.

like image 28
Jesse Hall Avatar answered Sep 18 '22 17:09

Jesse Hall