CUDA and nvcc: using the preprocessor to choose between float or double

Q: What is nvcc used for?

1.1. It is the purpose of nvcc , the CUDA compiler driver, to hide the intricate details of CUDA compilation from developers. It accepts a range of conventional compiler options, such as for defining macros and include/library paths, and for steering the compilation process.

Q: What are the three general section of Cuda program?

To execute any CUDA program, there are three main steps: Copy the input data from host memory to device memory, also known as host-to-device transfer. Load the GPU program and execute, caching data on-chip for performance. Copy the results from device memory to host memory, also called device-to-host transfer.

Q: How do I compile CUDA code?

In order to compile CUDA code files, you have to use nvcc compiler. Cuda codes can only be compiled and executed on node that have a GPU. Heracles has 4 Nvidia Tesla P100 GPUs on node18. Cuda Compiler is installed on node 18, so you need ssh to compile cuda programs.

Q: Does nvcc support C ++ 17?

Here is a list about C++ standard support on the device side for nvcc and clang -x cuda : https://gist.github.com/ax3l/9489132#device-side-c-standard-support (NVCC 11.0 supports device-side C++17 now.) Still C++17 is not supported with Visual Studio (msvc).

Tags:

preprocessor

cuda

nvcc

The problem:

Having a .h, I want to define real to be double if compiling for c/c++ or for cuda with computing capability >= 1.3. If compiling for cuda with computing capability < 1.3 then define real to be float.

After many hours I came to this (which does not work )

#   if defined(__CUDACC__)

#       warning * making definitions for cuda

#       if defined(__CUDA_ARCH__)
#           warning __CUDA_ARCH__ is defined
#       else
#           warning __CUDA_ARCH__ is NOT defined
#       endif

#       if (__CUDA_ARCH__ >= 130)
#                       define real double
#                       warning using double in cuda
#       elif (__CUDA_ARCH__ >= 0)
#               define real float
#               warning using float in cuda
#               warning how the hell is this printed when __CUDA_ARCH__ is not defined?
#       else
#               define real 
#               error what the hell is the value of __CUDA_ARCH__ and how can I print it
#       endif

#   else
#       warning * making definitions for c/c++
#       define real double
#       warning using double for c/c++
#   endif

when I compile (note the -arch flag)

nvcc -arch compute_13  -Ilibcutil testFloatDouble.cu

I get

* making definitions for cuda
__CUDA_ARCH__ is defined
using double in cuda

* making definitions for cuda
warning __CUDA_ARCH__ is NOT defined
warning using float in cuda
how the hell is this printed if __CUDA_ARCH__ is not defined now?

Undefined symbols for architecture i386:
  "myKernel(float*, int)", referenced from: ....

I know that files get compiled twice by nvcc. The first one is OK (CUDACC defined and CUDA_ARCH >= 130) but what happens the second time? CUDA_DEFINED but CUDA_ARCH undefined or with value < 130? Why ?

Thanks for your time.

264

asked Jan 09 '12 22:01

cibercitizen1

2 Answers

It seems you might be conflating two things - how to differentiate between the host and device compilation trajectories when nvcc is processing CUDA code, and how to differentiate between CUDA and non-CUDA code. There is a subtle difference between the two. __CUDA_ARCH__ answers the first question, and __CUDACC__ answers the second.

Consider the following code snippet:

#ifdef __CUDACC__
#warning using nvcc

template <typename T>
__global__ void add(T *x, T *y, T *z)
{
    int idx = threadIdx.x + blockDim.x * blockIdx.x;

    z[idx] = x[idx] + y[idx];
}

#ifdef __CUDA_ARCH__
#warning device code trajectory
#if __CUDA_ARCH__ > 120
#warning compiling with double precision
template void add<double>(double *, double *, double *);
#else
#warning compiling with single precision
template void add<float>(float *, float *, float *);
#else
#warning nvcc host code trajectory
#endif
#else
#warning non-nvcc code trajectory
#endif

Here we have a templated CUDA kernel with CUDA architecture dependent instantiation, a separate stanza for host code steeered by nvcc, and a stanza for compilation of host code not steered by nvcc. This behaves as follows:

$ ln -s cudaarch.cu cudaarch.cc
$ gcc -c cudaarch.cc -o cudaarch.o
cudaarch.cc:26:2: warning: #warning non-nvcc code trajectory

$ nvcc -arch=sm_11 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:14:2: warning: #warning device code trajectory
cudaarch.cu:19:2: warning: #warning compiling with single precision
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:23:2: warning: #warning nvcc host code trajectory
ptxas info    : Compiling entry function '_Z3addIfEvPT_S1_S1_' for 'sm_11'
ptxas info    : Used 4 registers, 12+16 bytes smem

$ nvcc -arch=sm_20 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:14:2: warning: #warning device code trajectory
cudaarch.cu:16:2: warning: #warning compiling with double precision
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:23:2: warning: #warning nvcc host code trajectory
ptxas info    : Compiling entry function '_Z3addIdEvPT_S1_S1_' for 'sm_20'
ptxas info    : Used 8 registers, 44 bytes cmem[0]

The take away points from this are:

__CUDACC__ defines whether nvcc is steering compilation or not
__CUDA_ARCH__is always undefined when compiling host code, steered by nvcc or not
__CUDA_ARCH__is only defined for the device code trajectory of compilation steered by nvcc

Those three pieces of information are always enough to have conditional compilation for device code to different CUDA architectures, host side CUDA code, and code not compiled by nvccat all. The nvccdocumentation is a bit terse at times, but all of this is covered in the discussion on compilation trajectories.

125

answered Sep 22 '22 10:09

talonmies

For the moment the only practical solution I see is using a custom define:


#   if (!defined(__CUDACC__) ||  defined(USE_DOUBLE_IN_CUDA)) 
#       define real double
#       warning defining double for cuda or c/c++
#   else
#       define real float
#       warning defining float for cuda
#   endif

and then

nvcc -DUSE_DOUBLE_IN_CUDA -arch compute_13  -Ilibcutil testFloatDouble.cu

As it outputs the for the two compilations:

#warning defining double for cuda or c/c++
#warning defining double for cuda or c/c++

and

nvcc  -Ilibcutil testFloatDouble.cu

does

#warning defining float for cuda
#warning defining float for cuda

answered Sep 21 '22 10:09

cibercitizen1

Related questions
                            
                                What algorithm does OpenCV's Bayer conversion use?
                            
                                Meaning of following syntax of cuda Kernel
                            
                                Beginner CUDA - Simple var increment not working
                            
                                Half precision floating points in CUDA
                            
                                CUDA synchronization kernels
                            
                                libNVVM cannot be found
                            
                                CUDA kernel doesn't launch
                            
                                CUDA substituting __syncthreads instead of __threadfence() difference
                            
                                How to generate random permutations with CUDA
                            
                                colored image to greyscale image using CUDA parallel processing
                            
                                Improving kernel performance by increasing occupancy?
                            
                                When is padding for shared memory really required?
                            
                                Dynamic Parallelism - undefined reference to __cudaRegisterLinkedBinary linking error while compiling - separate compilation
                            
                                Learn Nvidia CUDA
                            
                                Generating random numbers with uniform distribution using Thrust
                            
                                What does nvprof output: "No kernels were profiled" mean, and how to fix it
                            
                                Custom Kernel GpuMat with float
                            
                                Simple File I/O in Cuda C++
                            
                                Can I run a CUDA Program on a computer if I remote into it using Windows RDP? [duplicate]
                            
                                Why do gcc and NVCC (g++) see two different structure sizes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CUDA and nvcc: using the preprocessor to choose between float or double

Tags:

preprocessor

cuda

nvcc

cibercitizen1

People also ask

2 Answers

talonmies

cibercitizen1

Recent Activity

Donate For Us