Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA host and device using same __constant__ memory

Tags:

cuda

I have device/host function that uses constant memory. It runs OK on device, but on host it seems like this memory remains uninitialized.

#include <iostream>
#include <stdio.h>


const __constant__ double vals[2] = { 0.0, 1000.0 };

__device__ __host__ double f(size_t i)
{
    return vals[i];
}

__global__ void kern()
{
    printf("vals[%d] = %lf\n", threadIdx.x, vals[threadIdx.x]);
}

int main() {
    std::cerr << f(0) << " " << f(1) << std::endl;
    kern<<<1, 2>>>();
    cudaThreadSynchronize();
}

This prints (requires CC 2.0 or above)

0 0
vals[0] = 0.000000
vals[1] = 1000.000000

What is the problem and how can I get both device and host memory constants initialized simultaneously?

like image 363
davinchi Avatar asked Feb 26 '12 21:02

davinchi


3 Answers

Since CygnusX1 misunderstood what I meant in my comment on MurphEngineer's answer, maybe I should post my own answer. What I meant was this:

__constant__ double dc_vals[2] = { 0.0, 1000.0 };
       const double hc_vals[2] = { 0.0, 1000.0 };

__device__ __host__ double f(size_t i)
{
#ifdef __CUDA_ARCH__
    return dc_vals[i];
#else
    return hc_vals[i];
#endif
}

This has the same result as Cygnus', but it is more flexible in the face of real code: it lets you have runtime-defined values in your constant arrays, for example, and allows you to use CUDA API functions like cudaMemcpyToSymbol/cudsaMemcpyFromSymbol on the __constant__ array.

A more realistic complete example:

#include <iostream>
#include <stdio.h>

__constant__ double dc_vals[2];
       const double hc_vals[2];

__device__ __host__ double f(size_t i)
{
#ifdef __CUDA_ARCH__
    return dc_vals[i];
#else
    return hc_vals[i];
#endif
}

__global__ void kern()
{
    printf("vals[%d] = %lf\n", threadIdx.x, vals[threadIdx.x]);
}

int main() {
    hc_vals[0] = 0.0;
    hc_vals[1] = 1000.0;

    cudaMemcpyToSymbol(dc_vals, hc_vals, 2 * sizeof(double), 0, cudaMemcpyHostToDevice);

    std::cerr << f(0) << " " << f(1) << std::endl;
    kern<<<1, 2>>>();
    cudaThreadSynchronize();
}
like image 165
harrism Avatar answered Nov 03 '22 19:11

harrism


I think MurphEngineer explained well why it does not work.

To quickly fix this problem, you can follow harrism's idea, something like this:

#ifdef __CUDA_ARCH__
#define CONSTANT __constant__
#else
#define CONSTANT
#endif

const CONSTANT double vals[2] = { 0.0, 1000.0 };

This way the host compilation will create a normal host const array, while device compilation will create a device __constant__ compilation.

Do note that with this trick it might be harder to use CUDA API to access that device array with functions like cudaMemcpyToSymbol() if you ever decide to do so.

like image 4
CygnusX1 Avatar answered Nov 03 '22 18:11

CygnusX1


Using the __constant__ qualifier explicitly allocates that memory on the device. There is no way to access that memory from the host -- not even with the new CUDA Unified Addressing stuff (that only works for memory allocated with cudaMalloc() and its friends). Qualifying the variable with const just says "this is a constant pointer to (...)".

The correct way to do this is, indeed, to have two arrays: one on the host, and one on the device. Initialize your host array, then use cudaMemcpyToSymbol() to copy data to the device array at runtime. For more information on how to do this, see this thread: http://forums.nvidia.com/index.php?showtopic=69724

like image 4
mtrberzi Avatar answered Nov 03 '22 19:11

mtrberzi