compilation .cu files with Dynamic Parallelism(CUDA)

Tags:

I switched to a new GPU GeForce GTX 980 with cc 5.2, so it must support dynamic parallelism. However, I was not able to compile even a simple code (from programming guide). I will not provide it here (not necessary, just there is a global kernel calling another global kernel).

1) I use VS2013 for coding. In property pages -> CUDA C/C++ -> device, I changed code generation property to compute_35,sm_35, and here is the output:

1>------ Build started: Project: testCublas3, Configuration: Debug Win32 ------
1>  Compiling CUDA source file kernel.cu...
1>  
1>  C:\programs\misha\cuda\Projects\test projects\testCublas3\testCublas3>"C:\Program      Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin\nvcc.exe" -gencode=arch=compute_35,code=\"sm_35,compute_35\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include"  -G   --keep-dir Debug -maxrregcount=0  --machine 32 --compile -cudart static  -g   -DWIN32 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd  " -o Debug\kernel.cu.obj "C:\programs\misha\cuda\Projects\test projects\testCublas3\testCublas3\kernel.cu" 
1>C:/programs/misha/cuda/Projects/test projects/testCublas3/testCublas3/kernel.cu(13): error : kernel launch from __device__ or __global__ functions requires separate compilation mode
1>  kernel.cu
1>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V120\BuildCustomizations\CUDA 6.5.targets(593,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin\nvcc.exe" -gencode=arch=compute_35,code=\"sm_35,compute_35\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include"  -G   --keep-dir Debug -maxrregcount=0  --machine 32 --compile -cudart static  -g   -DWIN32 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd  " -o Debug\kernel.cu.obj "C:\programs\misha\cuda\Projects\test projects\testCublas3\testCublas3\kernel.cu"" exited with code 2.

I guess, that I need another option for this compilation: -rdc=true, but I didn't find where I can set it in VS2013.

2) When I set code generationproperty to compute_52,sm_52, there is a error: Unsupported gpu architecture 'compute_52'. But my cc is 5.2. So I can compile codes for 3.5 cc maximum?

Thanks

275

asked Jan 09 '15 00:01

Mikhail Genkin

1 Answers

Regarding item 1, cuda dynamic parallelism requires separate compilation and linking (-rdc=true), as well as linking in of the device cudart libraries (-lcudadevrt). Dynamic parallelism that also uses CUBLAS will also require linking in the device CUBLAS library (-lcublas_device). Possibly the simplest way to define where all these should go in a visual studio project is to start by looking at a visual studio project for the device cublas sample.

Regarding item 2, the reason your GTX 980 compute capability 5.2 is not being recognized is that you need the latest update for the cuda 6.5 toolkit, which is available here.

(Note that the cublas_device capability has been removed from recent versions of CUDA.)

118

answered Sep 28 '22 21:09

Robert Crovella

Related questions
                            
                                cuda-memcheck, how to get from address to source code?
                            
                                How is 3D texture memory cached?
                            
                                How to write a pointer-chasing benchmark using 64-bit pointers in CUDA?
                            
                                How to get cmake to enable cuda when compiling yolo (darknet)?
                            
                                Generalized sliding-window computation on the GPU
                            
                                Can I use thrust::host_vector or I must use cudaHostAlloc for zero-copy with Thrust?
                            
                                CUDA kernel as member function of a class
                            
                                NVCC warning level
                            
                                Segmentation fault in __pthread_getspecific called from libcuda.so.1
                            
                                Global memory access and L1 cache in Kepler
                            
                                How to have Apache Spark running on GPU?
                            
                                CUDA device runtime api cudaMemsetAsync doesn't work
                            
                                Call multiple times get_global_id() vs save the result in the local variable?
                            
                                Problem when calling template CUDA kernel
                            
                                Invalid argument in cudaMemcpy3D using width in bytes?
                            
                                How good is OpenCV GPU library for matrix operations?
                            
                                How to debug CUDA using eclipse Nsight with only one GPU
                            
                                How to measure GPU vs CPU performance? Which time measurement functions?
                            
                                How to generate, compile and run CUDA kernels at runtime
                            
                                Generating random numbers: CPU vs GPU, which currently wins?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

compilation .cu files with Dynamic Parallelism(CUDA)

Tags:

cuda

dynamic-parallelism

Mikhail Genkin

People also ask

1 Answers

Robert Crovella

Recent Activity

Donate For Us