Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I get CUDA Compute capability (version) in compile time by #define?

Tags:

cuda

gpgpu

nvcc

How can I get CUDA Compute capability (version) in compile time by #define? For example, if I use __ballot and compile with

nvcc -c -gencode arch=compute_20,code=sm_20  \
        -gencode arch=compute_13,code=sm_13
        source.cu

can I get version of compute capability in my code by #define for choose the branch of code with __ballot and without?

like image 521
Alex Avatar asked Oct 02 '12 22:10

Alex


People also ask

What is CUDA compute capability?

CUDA® is a parallel computing platform and programming model that enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).

How do you calculate GPU capability?

Run the nvidia-smi command. So below, you can see my GeForce GTX 950 has a computer power of 5.0: The reason for checking this was from a blog on Medium regarding TensorFlow.

How do I know if nvcc is installed?

Run which nvcc to find if nvcc is installed properly. You should see something like /usr/bin/nvcc. If that appears, your NVCC is installed in the standard directory. If you have installed the CUDA toolkit but which nvcc returns no results, you might need to add the directory to your path.


1 Answers

Yes. First, it's best to understand what happens when you use -gencode. NVCC will compile your input device code multiple times, once for each device target architecture. So in your example, NVCC will run compilation stage 1 once for compute_20 and once for compute_13.

When nvcc compiles a .cu file, it defines two preprocessor macros, __CUDACC__ and __CUDA_ARCH__. __CUDACC__ does not have a value, it is simply defined if cudacc is the compiler, and not defined if it isn't.

__CUDA_ARCH__ is defined to an integer value representing the SM version being compiled.

  • 100 = compute_10
  • 110 = compute_11
  • 200 = compute_20

etc. To quote the NVCC documentation included with the CUDA Toolkit:

The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy. This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

So, in your case where you want to use __ballot(), you can do this:

....
#if __CUDA_ARCH__ >= 200
    int b = __ballot();
    int p = popc(b & lanemask);
#else
    // do something else for earlier architectures
#endif
like image 190
harrism Avatar answered Sep 21 '22 05:09

harrism