Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Usage of global vs. constant memory in CUDA

Tags:

memory

cuda

Hey there, I have the following piece of code:

#if USE_CONST == 1
    __constant__ double PNT[ SIZE ];    
#else
    __device__ double *PNT;
#endif

and a bit later I have:

#if USE_CONST == 0
    cudaMalloc((void **)&PNT, sizeof(double)*SIZE);
    cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
#else
    cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE);
#endif

whereas point is somewhere defined in the code before. When working with USE_CONST=1 everything works as expected, but when working without it, than it doesn't. I access the array in my kernel-function via

PNT[ index ]

Where's the problem between the both variants? Thanks!

like image 842
tim Avatar asked May 17 '11 23:05

tim


1 Answers

The correct usage of cudaMemcpyToSymbol prior to CUDA 4.0 is:

cudaMemcpyToSymbol("PNT", point, sizeof(double)*SIZE)

or alternatively:

double *cpnt;
cudaGetSymbolAddress((void **)&cpnt, "PNT");
cudaMemcpy(cpnt, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);

which might be a bit faster if you are planning to access the symbol from the host API more than once.

EDIT: misunderstood the question. For the global memory version, do something similar to the second version for constant memory

double *gpnt;
cudaGetSymbolAddress((void **)&gpnt, "PNT");
cudaMemcpy(gpnt, point, sizeof(double)*SIZE.  cudaMemcpyHostToDevice););
like image 117
talonmies Avatar answered Sep 23 '22 13:09

talonmies