How to define CUDA device constant like a C++ const/constexpr?

Question

In a .cu file I've tried the following in the global scope (i.e. not in a function):

__device__ static const double cdInf = HUGE_VAL / 4;

And got nvcc error:

error : dynamic initialization is not supported for __device__, __constant__ and __shared__ variables.

How to define a C++ const/constexpr on the device, if that's possible?

NOTE1: #define is out of question not only for aesthetic reasons, but also because in practice the expression is more complex and involves an internal data type, not just double. So calling the constructor each time in each CUDA thread would be too expensive.

NOTE2: I doubt the performance of __constant__ because it's not a compile-time constant, but rather like a variable written with cudaMemcpyToSymbol.

einpoklum · Accepted Answer

Use a constexpr __device__ function:

#include <stdio.h>
__device__ constexpr double cdInf() { return HUGE_VAL / 4; }
__global__ void print_cdinf() { printf("in kernel, cdInf() is %lf
", cdInf()); }
int main() { print_cdinf<<<1, 1>>>(); return 0; }

The PTX should be something like:

.visible .entry print_cdinf()(

)
{
        .reg .b64       %SP;
        .reg .b64       %SPL;
        .reg .b32       %r<2>;
        .reg .b64       %rd<7>;


        mov.u64         %rd6, __local_depot0;
        cvta.local.u64  %SP, %rd6;
        add.u64         %rd1, %SP, 0;
        cvta.to.local.u64       %rd2, %rd1;
        mov.u64         %rd3, 9218868437227405312;
        st.local.u64    [%rd2], %rd3;
        mov.u64         %rd4, $str;
        cvta.global.u64         %rd5, %rd4;
        // Callseq Start 0
        {
        .reg .b32 temp_param_reg;
        // <end>}
        .param .b64 param0;
        st.param.b64    [param0+0], %rd5;
        .param .b64 param1;
        st.param.b64    [param1+0], %rd1;
        .param .b32 retval0;
        call.uni (retval0), 
        vprintf, 
        (
        param0, 
        param1
        );
        ld.param.b32    %r1, [retval0+0];

        //{
        }// Callseq End 0
        ret;
}

With no code for the constexpr function. You could also use a constexpr __host__ function, but that's experimental in CUDA 7: use the nvcc command-line options seems to be --expt-relaxed-constexpr and see here for more details (thanks @harrism).

talonmies · Answer

To make the code you have shown compile and work as expected, you need to initialize the variable at runtime, not compile time. To do this, add a host side call to cudaMemcpyToSymbol, something like:

__device__ double cdInf;

// ...

double val = HUGE_VAL / 4
cudaMemcpyToSymbol(cdInf, &val, sizeof(double));

However, for a single value, passing it as a kernel argument would seem far more sensible. The compiler will automagically store the argument in constant memory on all supported architectures, and there is a "free" constant cache broadcast mechanism which should make the cost of accessing the value at runtime negligible.

How to define CUDA device constant like a C++ const/constexpr?

Tags:

c++

constants

cuda

compile-time-constant

Serge Rogatch

2 Answers

einpoklum

talonmies

Recent Activity

Donate For Us

How to define CUDA device constant like a C++ const/constexpr?

Tags:

c++

constants

cuda

compile-time-constant

Serge Rogatch

2 Answers

einpoklum

talonmies

Related questions

Recent Activity

Donate For Us