I am trying separate a CUDA program into two separate .cu files in effort to edge closer to writing a real app in C++. I have a simple little program that: Allocates a memory on the host and the device. Initializes the host array to a series of numbers. Copies the host array to a device array Finds the square of all the elements in the array using a device kernel Copies the device array back to the host array Prints the results This works great if I put it all in one .cu file and run it. When I split it into two separate files I start getting linking errors. Like all my recent questions, I know this is something small, but what is it? KernelSupport.cu <pre class="prettyprint"><code>#ifndef _KERNEL_SUPPORT_ #define _KERNEL_SUPPORT_ #include <iostream> #include <MyKernel.cu> int main( int argc, char** argv) { int* hostArray; int* deviceArray; const int arrayLength = 16; const unsigned int memSize = sizeof(int) * arrayLength; hostArray = (int*)malloc(memSize); cudaMalloc((void**) &deviceArray, memSize); std::cout << "Before device\n"; for(int i=0;i<arrayLength;i++) { hostArray[i] = i+1; std::cout << hostArray[i] << "\n"; } std::cout << "\n"; cudaMemcpy(deviceArray, hostArray, memSize, cudaMemcpyHostToDevice); TestDevice <<< 4, 4 >>> (deviceArray); cudaMemcpy(hostArray, deviceArray, memSize, cudaMemcpyDeviceToHost); std::cout << "After device\n"; for(int i=0;i<arrayLength;i++) { std::cout << hostArray[i] << "\n"; } cudaFree(deviceArray); free(hostArray); std::cout << "Done\n"; } #endif </code></pre> MyKernel.cu <pre class="prettyprint"><code>#ifndef _MY_KERNEL_ #define _MY_KERNEL_ __global__ void TestDevice(int *deviceArray) { int idx = blockIdx.x*blockDim.x + threadIdx.x; deviceArray[idx] = deviceArray[idx]*deviceArray[idx]; } #endif </code></pre> Build Log: <pre class="prettyprint"><code>1>------ Build started: Project: CUDASandbox, Configuration: Debug x64 ------ 1>Compiling with CUDA Build Rule... 1>"C:\CUDA\bin64\nvcc.exe" -arch sm_10 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -maxrregcount=32 --compile -o "x64\Debug\KernelSupport.cu.obj" "d:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\CUDASandbox\KernelSupport.cu" 1>KernelSupport.cu 1>tmpxft_000016f4_00000000-3_KernelSupport.cudafe1.gpu 1>tmpxft_000016f4_00000000-8_KernelSupport.cudafe2.gpu 1>tmpxft_000016f4_00000000-3_KernelSupport.cudafe1.cpp 1>tmpxft_000016f4_00000000-12_KernelSupport.ii 1>Linking... 1>KernelSupport.cu.obj : error LNK2005: __device_stub__Z10TestDevicePi already defined in MyKernel.cu.obj 1>KernelSupport.cu.obj : error LNK2005: "void __cdecl TestDevice__entry(int *)" (?TestDevice__entry@@YAXPEAH@Z) already defined in MyKernel.cu.obj 1>D:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\x64\Debug\CUDASandbox.exe : fatal error LNK1169: one or more multiply defined symbols found 1>Build log was saved at "file://d:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\CUDASandbox\x64\Debug\BuildLog.htm" 1>CUDASandbox - 3 error(s), 0 warning(s) ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ========== </code></pre> I am running Visual Studio 2008 on Windows 7 64bit. <hr> Edit: I think I need to elaborate on this a little bit. The end result I am looking for here is to have a normal C++ application with something like Main.cpp with the <code>int main()</code> event and have things run from there. At certains point in my .cpp code I want to be able to reference CUDA bits. So my thinking (and correct me if there a more standard convention here) is that I will put the CUDA Kernel code into their on .cu files, and then have a supporting .cu file that will take care of talking to the device and calling kernel functions and what not.

You are including <code>mykernel.cu</code> in <code>kernelsupport.cu</code>, when you try to link the compiler sees mykernel.cu twice. You'll have to create a header defining TestDevice and include that instead. re comment: Something like this should work <pre class="prettyprint"><code>// MyKernel.h #ifndef mykernel_h #define mykernel_h __global__ void TestDevice(int* devicearray); #endif </code></pre> and then change the including file to <pre class="prettyprint"><code>//KernelSupport.cu #ifndef _KERNEL_SUPPORT_ #define _KERNEL_SUPPORT_ #include <iostream> #include <MyKernel.h> // ... </code></pre> re your edit As long as the header you use in c++ code doesn't have any cuda specific stuff (<code>__kernel__</code>,<code>__global__</code>, etc) you should be fine linking c++ and cuda code.

How to separate CUDA code into multiple files

Tags:

c++

c

visual-studio-2008

cuda

I am trying separate a CUDA program into two separate .cu files in effort to edge closer to writing a real app in C++. I have a simple little program that:

Allocates a memory on the host and the device.
Initializes the host array to a series of numbers. Copies the host array to a device array Finds the square of all the elements in the array using a device kernel Copies the device array back to the host array Prints the results

This works great if I put it all in one .cu file and run it. When I split it into two separate files I start getting linking errors. Like all my recent questions, I know this is something small, but what is it?

KernelSupport.cu

#ifndef _KERNEL_SUPPORT_
#define _KERNEL_SUPPORT_

#include <iostream>
#include <MyKernel.cu>

int main( int argc, char** argv) 
{
    int* hostArray;
    int* deviceArray;
    const int arrayLength = 16;
    const unsigned int memSize = sizeof(int) * arrayLength;

    hostArray = (int*)malloc(memSize);
    cudaMalloc((void**) &deviceArray, memSize);

    std::cout << "Before device\n";
    for(int i=0;i<arrayLength;i++)
    {
        hostArray[i] = i+1;
        std::cout << hostArray[i] << "\n";
    }
    std::cout << "\n";

    cudaMemcpy(deviceArray, hostArray, memSize, cudaMemcpyHostToDevice);
    TestDevice <<< 4, 4 >>> (deviceArray);
    cudaMemcpy(hostArray, deviceArray, memSize, cudaMemcpyDeviceToHost);

    std::cout << "After device\n";
    for(int i=0;i<arrayLength;i++)
    {
        std::cout << hostArray[i] << "\n";
    }

    cudaFree(deviceArray);
    free(hostArray);

    std::cout << "Done\n";
}

#endif

MyKernel.cu

#ifndef _MY_KERNEL_
#define _MY_KERNEL_

__global__ void TestDevice(int *deviceArray)
{
    int idx = blockIdx.x*blockDim.x + threadIdx.x;
    deviceArray[idx] = deviceArray[idx]*deviceArray[idx];
}


#endif

Build Log:

1>------ Build started: Project: CUDASandbox, Configuration: Debug x64 ------
1>Compiling with CUDA Build Rule...
1>"C:\CUDA\bin64\nvcc.exe"    -arch sm_10 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin"    -Xcompiler "/EHsc /W3 /nologo /O2 /Zi   /MT  "  -maxrregcount=32  --compile -o "x64\Debug\KernelSupport.cu.obj" "d:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\CUDASandbox\KernelSupport.cu" 
1>KernelSupport.cu
1>tmpxft_000016f4_00000000-3_KernelSupport.cudafe1.gpu
1>tmpxft_000016f4_00000000-8_KernelSupport.cudafe2.gpu
1>tmpxft_000016f4_00000000-3_KernelSupport.cudafe1.cpp
1>tmpxft_000016f4_00000000-12_KernelSupport.ii
1>Linking...
1>KernelSupport.cu.obj : error LNK2005: __device_stub__Z10TestDevicePi already defined in MyKernel.cu.obj
1>KernelSupport.cu.obj : error LNK2005: "void __cdecl TestDevice__entry(int *)" (?TestDevice__entry@@YAXPEAH@Z) already defined in MyKernel.cu.obj
1>D:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\x64\Debug\CUDASandbox.exe : fatal error LNK1169: one or more multiply defined symbols found
1>Build log was saved at "file://d:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\CUDASandbox\x64\Debug\BuildLog.htm"
1>CUDASandbox - 3 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

I am running Visual Studio 2008 on Windows 7 64bit.

Edit:

I think I need to elaborate on this a little bit. The end result I am looking for here is to have a normal C++ application with something like Main.cpp with the int main() event and have things run from there. At certains point in my .cpp code I want to be able to reference CUDA bits. So my thinking (and correct me if there a more standard convention here) is that I will put the CUDA Kernel code into their on .cu files, and then have a supporting .cu file that will take care of talking to the device and calling kernel functions and what not.

453

asked Jan 19 '10 04:01

Mr Bell

1 Answers

You are including mykernel.cu in kernelsupport.cu, when you try to link the compiler sees mykernel.cu twice. You'll have to create a header defining TestDevice and include that instead.

re comment:

Something like this should work

// MyKernel.h
#ifndef mykernel_h
#define mykernel_h
__global__ void TestDevice(int* devicearray);
#endif

and then change the including file to

//KernelSupport.cu
#ifndef _KERNEL_SUPPORT_
#define _KERNEL_SUPPORT_

#include <iostream>
#include <MyKernel.h>
// ...

re your edit

As long as the header you use in c++ code doesn't have any cuda specific stuff (__kernel__,__global__, etc) you should be fine linking c++ and cuda code.

answered Oct 22 '22 16:10

Scott Wales

Related questions
                            
                                Matrix multiplication in Rcpp
                            
                                Multiple return values (structured bindings) with unmovable types and guaranteed RVO in C++17
                            
                                Why is this expression being unsigneded?
                            
                                THREAD ERROR: invalid use of non-static member function [duplicate]
                            
                                With std::byte standardized, when do we use a void* and when a byte*?
                            
                                C++ : Running time of next() and prev() in a multiset iterator?
                            
                                Scope resolution operator being used twice
                            
                                C equivalent to C++ decltype
                            
                                Range-based for loop on unordered_map and references [duplicate]
                            
                                Terminate called after throwing an instance of an exception, core dumped
                            
                                Destruction order of static objects in shared libraries
                            
                                Generic lambda vs generic function give different behaviour
                            
                                Can we unit test memory allocation?
                            
                                Template function does not work for pointer-to-member-function taking const ref
                            
                                Sorting a vector in descending order within two ranges
                            
                                Why isn't std::swap marked constexpr before C++20?
                            
                                Can 'auto' be used as a subtype of lambda argument in C++?
                            
                                Quick and dirty way to profile your code
                            
                                Does the OS (POSIX) flush a memory-mapped file if the process is SIGKILLed?
                            
                                Properties file library for C (or C++)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With