I didn't think I was a complete newbie with Cuda, but apparently I am.
I recently upgraded my cuda device to one capable capability 1.3 to 2.1 (Geforce GT 630). I thought to do a full upgrade to Cuda toolkit 5.0 as well.
I can compile general cuda kernels, but printf is not working even with -arch=sm_20 set.
Code:
#include <stdio.h>
#include <assert.h>
#include <cuda.h>
#include <cuda_runtime.h>
__global__ void test(){
printf("Hi Cuda World");
}
int main( int argc, char** argv )
{
test<<<1,1>>>();
return 0;
}
Compiler:
Error 2 error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_20,compute_10\" --use-local-env --cl-version 2010 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -arch=sm_20 -g -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o "Debug\main.cu.obj" "d:\userstore\documents\visual studio 2010\Projects\testCuda\testCuda\main.cu"" exited with code 2. C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 5.0.targets 592 10 testCuda
Error 1 error : calling a __host__ function("printf") from a __global__ function("test") is not allowed d:\userstore\documents\visual studio 2010\Projects\testCuda\testCuda\main.cu 9 1 testCuda
I'm about done with life because of this problem...done done done. Please talk me down from the rooftops with an answer.
If you're using printf
in kernel, you should use cudaDeviceSynchronize()
:
#include <stdio.h>
#include <assert.h>
#include <cuda.h>
#include <cuda_runtime.h>
__global__ void test(){
printf("Hi Cuda World");
}
int main( int argc, char** argv )
{
test<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
In kernel printf is only supported in compute capability 2 or higher hardware. Because your project is set to build for both compute capability 1.0 and compute 2.1, nvcc compiles the code multiple times and builds a multi-architecture fatbinary object. It is during the compute capability 1.0 compilation cycle that the error is being generated, because the printf
call is unsupported for that architecture.
If you remove the compute capability 1.0 build target from your project, the error will disappear.
You could alternatively, write the kernel like this:
__global__ void test()
{
#if __CUDA_ARCH__ >= 200
printf("Hi Cuda World");
#endif
}
The __CUDA_ARCH__
symbol will only be >= 200 when building for compute capability 2.0 or high targets and this would allow you to compile this code for compute capability 1.x devices without encountering a syntax error.
When compiling for the correct architecture and getting no output, you also need to ensure that the kernel finishes and the driver flushes the output buffer. To do this add a synchronizing call after the kernel launch in the host code
for example:
int main( int argc, char** argv )
{
test<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
[disclaimer: all code written in browser, never compiled, use at own risk]
If you do both things, you should be able to compile, run and see output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With