Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compiling part of a C++ program for GPU

Is it possible to compile (C++) code for the GPU with nvcc into a shared object (.so file) and load it dynamically from a C++ program (in this case, Cern's ROOT, which is essentially a C++ interpreter ("CINT")).

A simple example that I would like to run is:

extern "C"
void TestCompiled() {
  printf("test\n");
  exit(0); 
}

This code was compiled with nvcc --compiler-options '-fPIC' -o TestCompiled_C.so --shared TestCompiled.cu. Loading the shared object into ROOT with:

{ // Test.C program
  int error, check;
  check = gROOT->LoadMacro("TestCompiled_C.so", &error);
  cout << "check " << check << " " << " error: " << error << endl;
  TestCompiled();  // run macro
  exit(0); 
}

loads the library OK, but does not find TestCompiled():

$ root -b -l Test.C
root [0] 
Processing Test.C...
check 0  error: 0
Error: Function Hello() is not defined in current scope  Test.C:11:
*** Interpreter error recovered ***

Doing the same by compiling the first test script with ROOT (without the extern line, compiling with root TestCompiled.C++) works… What can I try in order to make the C++ program find the test function when nvcc does the compilation?

like image 244
Eric O Lebigot Avatar asked May 22 '14 13:05

Eric O Lebigot


People also ask

Can C program run on a GPU?

Using the CUDA Toolkit you can accelerate your C or C++ applications by updating the computationally intensive portions of your code to run on GPUs. To accelerate your applications, you can call functions from drop-in libraries as well as develop custom applications using languages including C, C++, Fortran and Python.

Can you compile code on GPU?

No, you cannot compile using GPUs. Nearly all compiled languages that provide binaries (basically C or C++) are made to run code on the CPU, not the GPU. These two architectures are fundamentally different, and specialize in very different things.

Is CUDA written in C?

NVIDIA's CUDA CompilerEach CUDA program is a combination of host code written in C/C++ standard semantics with some extensions within CUDA API as well as the GPU device kernel functions.


2 Answers

I am assuming that the shared object file being output is like any other shared library, such as one created with GCC using the shared option. In this case, to load the object dynamically, you will need to use the dlopen function to get a handle to the shared object. Then, you can use the dlsym function to look for a symbol in the file.

void *object_handle = dlopen("TestCompiled_C.so", RTLD_NOW);
if (object_handle == NULL)
{
  printf("%s\n", dlerror());
  // Exit or return error code
}
void *test_compiled_ptr = dlsym(object_handle, "TestCompiled");
if (!test_compiled)
{
  printf("%s\n", dlerror());
  // Exit or return error code
}

void (*test_compiled)() = (void (*)()) test_compiled_ptr;
test_compiled();

You will need to include dlfcn.h and link with -ldl when you compile.

The difference between this and what you are doing now is that you are loading the library statically rather that dynamically. Even though shared objects are "dynamically linked libraries," as they are called in the windows world, doing it the way you are now is loading all of the symbols in the object when the program is launched. To dynamically load certain symbols at runtime, you need to do it this way.

like image 107
ImOnALampshade Avatar answered Sep 25 '22 20:09

ImOnALampshade


I'm copying, for reference, the salient points of the answer from the RootTalk forum that solved the problem:

A key point is that the C interpreter of ROOT (CINT) requires a "CINT dictionary" for the externally compiled function. (There is no problem when compiling through ROOT, because ACLiC creates this dictionary when it pre-compiles the macro [root TestCompiled.C++]).

So, an interface TestCompiled.h++ must be created:

#ifdef __cplusplus
extern "C" {
#endif

  void TestCompiled(void);

#ifdef __cplusplus
} /* end of extern "C" */
#endif

The interface must then be loaded inside ROOT along with the shared object:

{ // Test.C ROOT/CINT unnamed macro (interpreted)
  Int_t check, error;
  check = gROOT->LoadMacro("TestCompiled_C.so", &error);
  std::cout << "_C.so check " << check << " error " << error << std::endl;
  check = gROOT->LoadMacro("TestCompiled.h++", &error);
  std::cout << "_h.so check " << check << " error " << error << std::endl;
  TestCompiled(); // execute the compiled function
}

ROOT can now use the externally compiled program: root -b -l -n -q Test.C works.

This can be tested with, e.g., g++ on the following TestCompiled.C:

#include <cstdio>
extern "C" void TestCompiled(void) { printf("test\n"); }

compiled with

g++ -fPIC -shared -o TestCompiled_C.so TestCompiled.C
like image 23
Eric O Lebigot Avatar answered Sep 23 '22 20:09

Eric O Lebigot