What is the best way to print device variables in CUDA outside of the kernel? Do I have to do a cudaMemcpy
to the host and then print the resulting values? When I try to use printf
on pointers created using cudaMalloc
, the program crashes. It seems that most of the attention focuses on printing inside the kernel, not in regular code.
Thanks, Eric
It may come as a surprise, but we can actually print text to the standard output from directly within a CUDA kernel; not only that, each individual thread can print its own output.
printf prints formatted output from a kernel to a host-side output stream. The output buffer for printf() is set to a fixed size before kernel launch (see Associated Host-Side API). It is circular and if more output is produced during kernel execution than can fit in the buffer, older output is overwritten.
cudaDeviceSynchronize() returns any error code of that has occurred in any of those kernels. Note that when a thread calls cudaDeviceSynchronize() , it is not aware which kernel launch constructs has been already executed by other threads in the block.
"When I try to use printf on pointers created using cudaMalloc, the program crashes"
If you have this:
int *d_data, *h_data;
cudaMalloc(&d_data, DSIZE);
You cannot do this:
printf(" %d ", *d_data);
as this requires dereferencing a device pointer (d_data
) in host code which is normally illegal in CUDA.
Instead you can do this:
h_data = (int *)malloc(DSIZE);
cudaMemcpy(h_data, d_data, DSIZE, cudaMemcpyDeviceToHost);
printf(" %d ", *h_data);
You can also investigate Unified Memory which is new in CUDA 6, and see if it will serve your purposes.
And, as mentioned in the comments, devices of cc2.0 or greater support printf
from the kernel, which operates on device data (only).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With