Can I get CUDA Compute capability (version) in compile time by #define?

Tags:

How can I get CUDA Compute capability (version) in compile time by #define? For example, if I use __ballot and compile with

nvcc -c -gencode arch=compute_20,code=sm_20  \
        -gencode arch=compute_13,code=sm_13
        source.cu

can I get version of compute capability in my code by #define for choose the branch of code with __ballot and without?

521

asked Oct 02 '12 22:10

Alex

1 Answers

Yes. First, it's best to understand what happens when you use -gencode. NVCC will compile your input device code multiple times, once for each device target architecture. So in your example, NVCC will run compilation stage 1 once for compute_20 and once for compute_13.

When nvcc compiles a .cu file, it defines two preprocessor macros, __CUDACC__ and __CUDA_ARCH__. __CUDACC__ does not have a value, it is simply defined if cudacc is the compiler, and not defined if it isn't.

__CUDA_ARCH__ is defined to an integer value representing the SM version being compiled.

100 = compute_10
110 = compute_11
200 = compute_20

etc. To quote the NVCC documentation included with the CUDA Toolkit:

The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy. This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

So, in your case where you want to use __ballot(), you can do this:

....
#if __CUDA_ARCH__ >= 200
    int b = __ballot();
    int p = popc(b & lanemask);
#else
    // do something else for earlier architectures
#endif

190

answered Sep 21 '22 05:09

harrism

Related questions
                            
                                Doing multiple matrix-matrix multiplications in one operation
                            
                                Initialize device array in CUDA
                            
                                CUDA-capable device
                            
                                CUDA performance improves when running more threads than there are cores
                            
                                Build Customization for CUDA 5.0 not found in Visual C++
                            
                                How to deal with the ECC support feature in NVIDIA graphics cards
                            
                                CUDA compile problems on Windows, Cmake error: No CUDA toolset found
                            
                                cuda device selection with multiple cpu threads
                            
                                Can we benchmark how fast CUDA or OpenCL is compared to CPU performance?
                            
                                Cuda code #define error, expected a ")"
                            
                                Any Advantage of MPI+CUDA over just pure MPI?
                            
                                BLAS equivalent of a LAPACK function for GPUs
                            
                                how to calculate an average from a int2 array using Thrust
                            
                                How to create a CUDA context?
                            
                                Advice for real time image processing
                            
                                CUDA Too many resources requested for launch
                            
                                Why should I use the CUDA Driver API instead of CUDA Runtime API?
                            
                                Simultaneous launch of Multiple Kernels using CUDA for a GPU
                            
                                What library do you use for matrix calculations on CUDA? [closed]
                            
                                Dearth of CUDA 5 Dynamic Parallelism Examples

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I get CUDA Compute capability (version) in compile time by #define?

Tags:

cuda

gpgpu

nvcc

Alex

People also ask

1 Answers

harrism

Recent Activity

Donate For Us