how to keep kernel code inside separate .cu file other than the main .cpp?

Question

How can I separate the cuda kernel code and the other cpp codes inside the project? I want to collect all the kernel definitions inside a single file as the other cpp files calling them in times of need. I tried to write all the kernels inside kernel.cu and calling the kernels by including kernel.cu file but it gives following error at compilation.

/usr/bin/ld: error: ./vector_summation.o: multiple definition of 

'perform_summation_method1(int*, int)'
/usr/bin/ld: ./kernels.o: previous definition here
/usr/bin/ld: error: ./vector_summation.o: multiple definition of '__device_stub__Z25perform_summation_method1Pii(int*, int)'
/usr/bin/ld: ./kernels.o: previous definition here
/usr/bin/ld: error: ./vector_summation.o: multiple definition of '__device_stub__Z25perform_summation_method2PiS_i(int*, int*, int)'
/usr/bin/ld: ./kernels.o: previous definition here
/usr/bin/ld: error: ./vector_summation.o: multiple definition of 'perform_summation_method2(int*, int*, int)'
/usr/bin/ld: ./kernels.o: previous definition here
/usr/bin/ld: error: ./vector_summation.o: multiple definition of '__device_stub__Z25perform_summation_method3PiS_i(int*, int*, int)'
/usr/bin/ld: ./kernels.o: previous definition here
/usr/bin/ld: error: ./vector_summation.o: multiple definition of 'perform_summation_method3(int*, int*, int)'
/usr/bin/ld: ./kernels.o: previous definition here

Robert Crovella · Accepted Answer

You do it essentially the same way you do it with ordinary cpp files/modules. In c++, you don't normally include one .cpp file in another, when you want to access functions from the other file. You include headers which normally only contain the function prototypes.

Here is one example:

test.h:

void my_cuda_func();

main.cpp:

#include <stdio.h>
#include "test.h"

int main(){
  my_cuda_func();
  return 0;
}

test.cu:

#include <stdio.h>
#include "test.h"


__global__ void my_kernel(){
  printf("Hello!
");
}

void my_cuda_func(){
  my_kernel<<<1,1>>>();
  cudaDeviceSynchronize();
}

use the following commands to build:

g++ -c main.cpp
nvcc -arch=sm_20 -c test.cu
g++  -o test main.o test.o -L/usr/local/cuda/lib64 -lcudart

There are other approaches of course. If you want to link to C instead of C++ you need to take that into account. If you want to call kernels directly from other modules instead of using a wrapper function, then you need to pass all your modules through nvcc instead of g++ (and they should all be .cu files). Also, if you want to have multiple files with GPU device code (e.g. kernel definitions) then you need to get familiar with using the device code linker.

For completeness, here is the above example re-worked to show what to do if you want all kernel definitions in one file, but be able to invoke kernels directly from another module:

test.h:

__global__ void my_kernel();

main.cu:

#include <stdio.h>
#include "test.h"

int main(){
  my_kernel<<<1,1>>>();
  cudaDeviceSynchronize();
  return 0;
}

test.cu:

#include <stdio.h>
#include "test.h"


__global__ void my_kernel(){
  printf("Hello!
");
}

build with:

nvcc -arch=sm_20 -c main.cu
nvcc -arch=sm_20 -c test.cu
nvcc -arch=sm_20 -o test main.o test.o

Jaa-c · Answer

You can create cuda header file *.cuh and include it as a standard header. I would not put only kernels in a separate file, rather put them together with some initializing functions and put only these functions to the header, since you don't usually just call kernels from the outside code, you call some function that takes care of memory etc. I usually make headers like this:

#ifndef __CUDAHEADER_CUH__
#define __CUDAHEADER_CUH__

/** Initialize cuda stuff */
void cudaInit(Data * host_data);

/** Cleanup, frees resources used by the device. */
void cudaFinalize();

#endif

And then there is file with kernels, device methods and these methods that handle cuda stuff:

#include "cudaHeader.cuh"

//some global variables like:
Data * device_data;

//some kernels and device functions:
__global__ void someKernel(data * device_data) {
    ...
}

void cudaInit(Data * host_data) {
    some cudaMalloc()
    some cudaMemcpy()
    someKernel<<< gridRes, blockRes >>>(device_data);
}


void cudaFinalize() {
    cudaFree(device_data);
}

But there are more ways how to handle your code...

how to keep kernel code inside separate .cu file other than the main .cpp?

Tags:

include

cuda

erogol

2 Answers

Robert Crovella

Jaa-c

Recent Activity

Donate For Us

how to keep kernel code inside separate .cu file other than the main .cpp?

Tags:

include

cuda

erogol

2 Answers

Robert Crovella

Jaa-c

Related questions

Recent Activity

Donate For Us