In a .cu file I've tried the following in the global scope (i.e. not in a function):
__device__ static const double cdInf = HUGE_VAL / 4;
And got nvcc error:
error : dynamic initialization is not supported for __device__, __constant__ and __shared__ variables.
How to define a C++ const/constexpr on the device, if that's possible?
NOTE1: #define
is out of question not only for aesthetic reasons, but also because in practice the expression is more complex and involves an internal data type, not just double. So calling the constructor each time in each CUDA thread would be too expensive.
NOTE2: I doubt the performance of __constant__
because it's not a compile-time constant, but rather like a variable written with cudaMemcpyToSymbol
.
Use a constexpr __device__
function:
#include <stdio.h>
__device__ constexpr double cdInf() { return HUGE_VAL / 4; }
__global__ void print_cdinf() { printf("in kernel, cdInf() is %lf\n", cdInf()); }
int main() { print_cdinf<<<1, 1>>>(); return 0; }
The PTX should be something like:
.visible .entry print_cdinf()(
)
{
.reg .b64 %SP;
.reg .b64 %SPL;
.reg .b32 %r<2>;
.reg .b64 %rd<7>;
mov.u64 %rd6, __local_depot0;
cvta.local.u64 %SP, %rd6;
add.u64 %rd1, %SP, 0;
cvta.to.local.u64 %rd2, %rd1;
mov.u64 %rd3, 9218868437227405312;
st.local.u64 [%rd2], %rd3;
mov.u64 %rd4, $str;
cvta.global.u64 %rd5, %rd4;
// Callseq Start 0
{
.reg .b32 temp_param_reg;
// <end>}
.param .b64 param0;
st.param.b64 [param0+0], %rd5;
.param .b64 param1;
st.param.b64 [param1+0], %rd1;
.param .b32 retval0;
call.uni (retval0),
vprintf,
(
param0,
param1
);
ld.param.b32 %r1, [retval0+0];
//{
}// Callseq End 0
ret;
}
With no code for the constexpr function. You could also use a constexpr __host__
function, but that's experimental in CUDA 7: use the nvcc command-line options seems to be --expt-relaxed-constexpr
and see here for more details (thanks @harrism).
To make the code you have shown compile and work as expected, you need to initialize the variable at runtime, not compile time. To do this, add a host side call to cudaMemcpyToSymbol
, something like:
__device__ double cdInf;
// ...
double val = HUGE_VAL / 4
cudaMemcpyToSymbol(cdInf, &val, sizeof(double));
However, for a single value, passing it as a kernel argument would seem far more sensible. The compiler will automagically store the argument in constant memory on all supported architectures, and there is a "free" constant cache broadcast mechanism which should make the cost of accessing the value at runtime negligible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With