Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why do we need cudaDeviceSynchronize(); in kernels with device-printf?

Tags:

c

cuda

gpu

nvidia

__global__ void helloCUDA(float f)
{
    printf("Hello thread %d, f=%f\n", threadIdx.x, f);
}

int main()
{
    helloCUDA<<<1, 5>>>(1.2345f);
    cudaDeviceSynchronize();
    return 0;
}

Why is cudaDeviceSynchronize(); at many places for example here it is not required after kernel call?

like image 852
gpuguy Avatar asked Oct 05 '13 02:10

gpuguy


People also ask

What does cudaDeviceSynchronize do?

cudaDeviceSynchronize() will force the program to ensure the stream(s)'s kernels/memcpys are complete before continuing, which can make it easier to find out where the illegal accesses are occuring (since the failure will show up during the sync).

What is CUDA kernel?

Figure 1 shows that the CUDA kernel is a function that gets executed on GPU. The parallel portion of your applications is executed K times in parallel by K different CUDA threads, as opposed to only one time like regular C/C++ functions. Figure 1. The kernel is a function executed on the GPU.

Are CUDA kernels blocking?

In CUDA, kernel launches are asynchronous (often called “non-blocking”). An example of kernel execution from host perspective: 1. Host call starts the kernel execution.

What is dynamic parallelism?

Under dynamic parallelism, one kernel may launch another kernel, and that kernel may launch another, and so on. Each subordinate launch is considered a new “nesting level,” and the total number of levels is the “nesting depth” of the program.


1 Answers

A kernel launch is asynchronous. This means it returns control to the CPU thread immediately after starting up the GPU process, before the kernel has finished executing.

So what is the next thing in the CPU thread here? Application exit.

At application exit, it's ability to send output to the standard output is terminated by the OS.

Thus the output that is generated later by the kernel has nowhere to go, and you won't see it.

On the other hand, if you use cudaDeviceSynchronize(), then the kernel is guaranteed to finish (and the output from the kernel will find a waiting standard output queue), before the application is allowed to exit.

like image 141
Robert Crovella Avatar answered Sep 27 '22 19:09

Robert Crovella