I am writing a cuda program and trying to print something inside the cuda kernels using the printf function. But when I am compiling the program then I am getting an error
error : calling a host function("printf") from a __device__/__global__ function("agent_movement_top") is not allowed
error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\common\inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -g -Xcompiler "/EHsc /nologo /Od /Zi /MDd " -o "Debug\test.cu.obj" "C:\Users\umdutta\Desktop\SANKHA_ALL_MATERIALS\PROGRAMMING_FOLDER\ABM_MODELLING_2D_3D\TRY_NUM_2\test_proj_test\test_proj\test_proj\test.cu"" exited with code 2.
I am using the card GTX 560 ti having a compute capability greater than 2.0 and when I have searched a bit about the printing from cuda kernels I also saw that I need to change the compiler from sm_10 to sm_2.0 to take the full advantage of the card. Also some suggested for cuPrintf to serve the purpose. I am bit confused what should I do and what should be the simplest and quickest way to get the printouts on my console screen. If I need to change the nvcc compiler from 1.0 to 2.0 then what should I do? One more thing I would like to mention that I am using windows 7.0 and programming in visual studio 2010. Thanks for all your help.
It may come as a surprise, but we can actually print text to the standard output from directly within a CUDA kernel; not only that, each individual thread can print its own output.
Figure 1 shows that the CUDA kernel is a function that gets executed on GPU. The parallel portion of your applications is executed K times in parallel by K different CUDA threads, as opposed to only one time like regular C/C++ functions. Figure 1. The kernel is a function executed on the GPU.
In CUDA, the host refers to the CPU and its memory, while the device refers to the GPU and its memory. Code run on the host can manage memory on both the host and device, and also launches kernels which are functions executed on the device. These kernels are executed by many GPU threads in parallel.
CudaDeviceSynchronize() is important in host code when we have multiple streams or when we want to debug the code.
One way of solving this problem is by using cuPrintf function which is capable of printing from the kernels. Copy the files cuPrintf.cu
and cuPrintf.cuh
from the folder
C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\src\simplePrintf
to the project folder. Then add the header file cuPrintf.cuh
to your project and add
#include "cuPrintf.cu"
to your code. Then your code should be written in a format mentioned below :
#include "cuPrintf.cu"
__global__ void testKernel(int val)
{
cuPrintf("Value is: %d\n", val);
}
int main()
{
cudaPrintfInit();
testKernel<<< 2, 3 >>>(10);
cudaPrintfDisplay(stdout, true);
cudaPrintfEnd();
return 0;
}
By following the above procedure one can get a print on the console window from the device function.
Though I solved my issues in the above mentioned way I still don't have the solution of using printf
from the device function. If it is true and absolutely necessary to upgrade my nvcc compiler from sm_10 to sm_21 to enable the printf
feature then it would be very much helpful if someone could show me the light. Thanks for all your cooperation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With