Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to define CUDA device constant like a C++ const/constexpr?

In a .cu file I've tried the following in the global scope (i.e. not in a function):

__device__ static const double cdInf = HUGE_VAL / 4;

And got nvcc error:

error : dynamic initialization is not supported for __device__, __constant__ and __shared__ variables.

How to define a C++ const/constexpr on the device, if that's possible?

NOTE1: #define is out of question not only for aesthetic reasons, but also because in practice the expression is more complex and involves an internal data type, not just double. So calling the constructor each time in each CUDA thread would be too expensive.

NOTE2: I doubt the performance of __constant__ because it's not a compile-time constant, but rather like a variable written with cudaMemcpyToSymbol.

like image 436
Serge Rogatch Avatar asked Sep 12 '16 08:09

Serge Rogatch


2 Answers

Use a constexpr __device__ function:

#include <stdio.h>
__device__ constexpr double cdInf() { return HUGE_VAL / 4; }
__global__ void print_cdinf() { printf("in kernel, cdInf() is %lf\n", cdInf()); }
int main() { print_cdinf<<<1, 1>>>(); return 0; }

The PTX should be something like:

.visible .entry print_cdinf()(

)
{
        .reg .b64       %SP;
        .reg .b64       %SPL;
        .reg .b32       %r<2>;
        .reg .b64       %rd<7>;


        mov.u64         %rd6, __local_depot0;
        cvta.local.u64  %SP, %rd6;
        add.u64         %rd1, %SP, 0;
        cvta.to.local.u64       %rd2, %rd1;
        mov.u64         %rd3, 9218868437227405312;
        st.local.u64    [%rd2], %rd3;
        mov.u64         %rd4, $str;
        cvta.global.u64         %rd5, %rd4;
        // Callseq Start 0
        {
        .reg .b32 temp_param_reg;
        // <end>}
        .param .b64 param0;
        st.param.b64    [param0+0], %rd5;
        .param .b64 param1;
        st.param.b64    [param1+0], %rd1;
        .param .b32 retval0;
        call.uni (retval0), 
        vprintf, 
        (
        param0, 
        param1
        );
        ld.param.b32    %r1, [retval0+0];

        //{
        }// Callseq End 0
        ret;
}

With no code for the constexpr function. You could also use a constexpr __host__ function, but that's experimental in CUDA 7: use the nvcc command-line options seems to be --expt-relaxed-constexpr and see here for more details (thanks @harrism).

like image 127
einpoklum Avatar answered Sep 21 '22 02:09

einpoklum


To make the code you have shown compile and work as expected, you need to initialize the variable at runtime, not compile time. To do this, add a host side call to cudaMemcpyToSymbol, something like:

__device__ double cdInf;

// ...

double val = HUGE_VAL / 4
cudaMemcpyToSymbol(cdInf, &val, sizeof(double));

However, for a single value, passing it as a kernel argument would seem far more sensible. The compiler will automagically store the argument in constant memory on all supported architectures, and there is a "free" constant cache broadcast mechanism which should make the cost of accessing the value at runtime negligible.

like image 26
talonmies Avatar answered Sep 19 '22 02:09

talonmies