Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

call kernel inside CUDA kernel

I am trying to do something like that:

__global__ void foo()
{
    // do stuff
}

__global__ void boo()
{
    foo<<<m, n>>>();
}

but I am getting the error "kernel launch from __device__ or __global__ functions requires separate compilation mode"

I tried googling for an answer and I saw some results talking about "dynamic-parallelism" and it says that it requires compute capability 3 or above which I have(GTX 750 Ti compute capability 5).
I also so that I need to turn "rdc" flag on, while it does make the error go away it makes the compilation fail no matter what(even if I comment everything)

So how can I achieve my goal or what might be the problem?
(using cuda 11.0)
I also added "cudadevrt.lib;cudart.lib;" to input in linker in project properties

EDIT:
The error it gives when rdc is set to true:

Error MSB3721 The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc.exe" -dlink -o "x64\Debug\crimson cuda.device-link.obj" -Xcompiler "/EHsc /W3 /nologo /Od /Zi /Fdx64\Debug\vc142.pdb /RTC1 /MDd " -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin/crt" -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\lib\x64" cudadevrt.lib cudart.lib cudart_static.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib -gencode=arch=compute_50,code=sm_50 -G --machine 64 x64\Debug\CrimsonNet.cu.obj x64\Debug\kernel.cu.obj" exited with code 1.

EDIT 2: I continued to investigate and it seems that the problem occur while linking the files which I don't fully understand how it works when using rdc.

like image 935
paferllume Avatar asked Sep 18 '25 06:09

paferllume


1 Answers

Using MS VS 2019 and CUDA 11.0, the following steps allowed me to create a dynamic parallelism (CDP) example:

  1. Create a new CUDA Runtime project

  2. In the kernel.cu file that is generated, modify the kernel like so:

     __global__ void child_kernel() {printf("hello\n");}
    
     __global__ void addKernel(int *c, const int *a, const int *b)
     {
         child_kernel << <1, 1 >> > ();
         int i = threadIdx.x;
         c[i] = a[i] + b[i];
     }
    
  3. In Project...Properties...CUDA C++...Common set Generate Relocatable Device Code to "Yes"

  4. In Project...Properties...CUDA Linker...General add cudadevrt.lib to Additional Dependencies

  5. Build or rebuild the project, you should then see output like this:

     1>------ Rebuild All started: Project: test23, Configuration: Debug x64 ------
     1>Compiling CUDA source file kernel.cu...
     1>
     1>C:\Users\Robert Crovella\source\repos\test23>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc.exe" -gencode=arch=compute_52,code=\"sm_52,compute_52\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.20.27508\bin\HostX86\x64" -x cu -rdc=true  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include"  -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static  -g   -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Fdx64\Debug\vc142.pdb /FS /Zi /RTC1 /MDd " -o x64\Debug\kernel.cu.obj "C:\Users\Robert Crovella\source\repos\test23\kernel.cu"
     1>kernel.cu
     1>
     1>C:\Users\Robert Crovella\source\repos\test23>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc.exe" -dlink -o x64\Debug\test23.device-link.obj -Xcompiler "/EHsc /W3 /nologo /Od /Zi /Fdx64\Debug\vc142.pdb /RTC1 /MDd " -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin/crt" -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\lib\x64" cudadevrt.lib cudart_static.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib cudart.lib  -gencode=arch=compute_52,code=sm_52 -G --machine 64 x64\Debug\kernel.cu.obj
     1>cudadevrt.lib
     1>cudart_static.lib
     1>kernel32.lib
     1>user32.lib
     1>gdi32.lib
     1>winspool.lib
     1>comdlg32.lib
     1>advapi32.lib
     1>shell32.lib
     1>ole32.lib
     1>oleaut32.lib
     1>uuid.lib
     1>odbc32.lib
     1>odbccp32.lib
     1>cudart.lib
     1>kernel.cu.obj
     1>   Creating library C:\Users\Robert Crovella\source\repos\test23\x64\Debug\test23.lib and object C:\Users\Robert Crovella\source\repos\test23\x64\Debug\test23.exp
     1>test23.vcxproj -> C:\Users\Robert Crovella\source\repos\test23\x64\Debug\test23.exe
     ========== Rebuild All: 1 succeeded, 0 failed, 0 skipped ==========
    

Notes:

  1. CUDA 11.0 (and higher) only target devices that will support CDP. For earlier versions, you may need to set the device code generation target to match a GPU that will support CDP (e.g. compute_35,sm_35)

  2. In MS VS, the MSB3721 error is not that useful by itself. It simply indicates "something went wrong". To get more useful info from Visual Studio, you should increase the verbosity of the console output. The exact method to do this will vary by VS version, but you can find instructions via a search such as this. The objective is to increase the verbosity so VS will show you the actual output generated by nvcc when there is an error.

  3. For CUDA 11.0/VS2019, the addition of cudadevrt.lib isn't necessary because it is already included in the project. For other/older versions it may be necessary.

If you're still having trouble, I suggest you increase the verbosity to get a better idea of the exact issue. You should also try the steps listed above exactly to make sure you understand them (i.e. starting with a new project). If you're still having trouble, post a new question with your actual code, as well as the console compile output after you increase the verbosity.

like image 90
Robert Crovella Avatar answered Sep 19 '25 20:09

Robert Crovella