<pre class="prettyprint"><code>__global__ void helloCUDA(float f) { printf("Hello thread %d, f=%f\n", threadIdx.x, f); } int main() { helloCUDA<<<1, 5>>>(1.2345f); cudaDeviceSynchronize(); return 0; } </code></pre> Why is cudaDeviceSynchronize(); at many places for example here it is not required after kernel call?

A kernel launch is asynchronous. This means it returns control to the CPU thread immediately after starting up the GPU process, before the kernel has finished executing. So what is the next thing in the CPU thread here? Application exit. At application exit, it's ability to send output to the standard output is terminated by the OS. Thus the output that is generated later by the kernel has nowhere to go, and you won't see it. On the other hand, if you use <code>cudaDeviceSynchronize()</code>, then the kernel is guaranteed to finish (and the output from the kernel will find a waiting standard output queue), before the application is allowed to exit.

why do we need cudaDeviceSynchronize(); in kernels with device-printf?

Tags:

c

cuda

gpu

nvidia

__global__ void helloCUDA(float f)
{
    printf("Hello thread %d, f=%f\n", threadIdx.x, f);
}

int main()
{
    helloCUDA<<<1, 5>>>(1.2345f);
    cudaDeviceSynchronize();
    return 0;
}

Why is cudaDeviceSynchronize(); at many places for example here it is not required after kernel call?

852

asked Oct 05 '13 02:10

gpuguy

1 Answers

A kernel launch is asynchronous. This means it returns control to the CPU thread immediately after starting up the GPU process, before the kernel has finished executing.

So what is the next thing in the CPU thread here? Application exit.

At application exit, it's ability to send output to the standard output is terminated by the OS.

Thus the output that is generated later by the kernel has nowhere to go, and you won't see it.

On the other hand, if you use cudaDeviceSynchronize(), then the kernel is guaranteed to finish (and the output from the kernel will find a waiting standard output queue), before the application is allowed to exit.

141

answered Sep 27 '22 19:09

Robert Crovella

Related questions
                            
                                A way to find the size and location of padding in a struct?
                            
                                why I get error: 'strcmp': identifier not found (visual studio 2010)
                            
                                printf too smart casting from char to int?
                            
                                C++ reference & const pointers in C/C++
                            
                                Incrementing function pointers
                            
                                Printing a number without using *printf
                            
                                Function inlining—what are examples where it hurt performance?
                            
                                pointer position reset
                            
                                struct member alignment - is it possible to assume no padding
                            
                                Segfault with strcmp
                            
                                Difference between - buffer overflow and return to libc attack
                            
                                Comparing chars with hex values
                            
                                How can I split up my monolithic programs into smaller, separate files?
                            
                                how does current->pid work for linux?
                            
                                Mahalanobis distance inverting the covariance matrix
                            
                                Linux C or C++ library to diff and patch strings? [closed]
                            
                                C, check if a file exists without being able to read/write possible? [duplicate]
                            
                                Read from pipe line by line in C
                            
                                "invalid controlling predicate" compiler error using OpenMP
                            
                                variable length array folded to constant array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With