In the host code, it seems that the __CUDA_ARCH__
macro wont generate different code path, instead, it will generate code for exact the code path for the current device.
However, if __CUDA_ARCH__
were within device code, it will generate different code path for different devices specified in compiliation options (/arch).
Can anyone confirm this is correct?
__CUDA_ARCH__
when used in device code will carry a number defined to it that reflects the code architecture currently being compiled.
It is not intended to be used in host code. From the nvcc manual:
This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.
Usage of __CUDA_ARCH__
in host code is therefore undefined (at least by CUDA). As pointed out by @tera in the comments, since the macro is undefined in host code, it could be used to differentiate host/device paths for example, in a __host__ __device__
function definition.
#ifndef __CUDA_ARCH__
//host code here
#else
//device code here
#endif
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With