Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA's cudaMemcpyToSymbol() throws "invalid argument" error

Tags:

c

cuda

The problem

I'm trying to copy an int array into the device's constant memory, but I keep getting the following error:

[ERROR] 'invalid argument' (11) in 'main.cu' at line '386'

The code

There's a lot of code developed, so I'm going to simplify what I have.

I've declared a device __constant__ variable at the top section of my main.cu file, outside any function.

__device__ __constant__ int* dic;

I also have a host variable, flatDic, that's malloc'ed the following way, inside main():

int* flatDic = (int *)malloc(num_codewords*(bSizeY*bSizeX)*sizeof(int));

Then I try to copy the contents of flatDic into dic by doing so, also in main():

cudaMemcpyToSymbol(dic, flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int));

This cudaMemcpyToSymbol() call it's line 386 of main.cu, and it's where the aforementioned error is thrown.

What I've tried

Here's what I've tried so far to solve the problem:

I've tried the all of the following, returning always the same error:

cudaMemcpyToSymbol(dic, &flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int));

cudaMemcpyToSymbol(dic, flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int));

cudaMemcpyToSymbol(dic, &flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int), 0, cudaMemcpyHostToDevice);

cudaMemcpyToSymbol(dic, flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int), 0, cudaMemcpyHostToDevice);

I've also tried to cudaMalloc() the dic variable, before calling cudaMemcpyToSymbol(). No errors are thrown in cudaMalloc(), but cudaMemcpyToSymbol() error persists.

cudaMalloc((void **) &dic, num_codewords*(bSizeY*bSizeX)*sizeof(int));

I've also search extensively thorough the web, documentation, forums, examples, etc, all to no avail.

Does anyone see anything wrong with my code? Thanks in advance.

like image 947
Telmo Marques Avatar asked Mar 12 '12 23:03

Telmo Marques


1 Answers

cudaMemcpyToSymbol copies to a constant variable, here you're trying to copy multiple bytes of type int (an allocated ARRAY) to a pointer of type int *. These types are not the same, hence the invalid type. To make this work, you will need to copy an ARRAY of int (allocated) to the device (static length) ARRAY of int (constant), e.g.:

__device__ __constant__ int dic[LEN];

Example from the CUDA C Programming Guide (which I suggest you read -- it's quite good!):

__constant__ float constData[256];
float data[256];
cudaMemcpyToSymbol(constData, data, sizeof(data));
cudaMemcpyFromSymbol(data, constData, sizeof(data));

To my knowledge you could also cudaMemcpyToSymbol a pointer to a pointer (unlike your example, where you're copying an array to a pointer), but beware only that pointer will be constant, not the memory it's pointing to on your device. If you were going to go this route, you would need to add a cudaMalloc, then cudaMemcpyToSymbol the resulting ptr to device memory to your __constant__ device var. AGAIN, in this case the array values WILL NOT be constant -- ONLY the pointer to the memory will be.

Your call for this case would be something like:

int * d_dic;
cudaMalloc((void **) &d_dic, num_codewords*(bSizeY*bSizeX)*sizeof(int));
cudaMemcpyToSymbol(c_dic_ptr, &d_Dic, sizeof(int *));

Also you should be wrapping your CUDA calls during debugging inside error checking logic. I've borrowed the following logic from talonmies:

__inline __host__ void gpuAssert(cudaError_t code, char *file, int line, 
                 bool abort=true)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code),
          file, line);
      if (abort) exit(code);
   }
}

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }

To call simply wrap your CUDA call in it like so:

gpuErrchk(cudaMemcpyToSymbol(dic, flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int)));

The programming will exit with an error message if you're having allocation issues or other common errors.

To check your kernel, do something like:

MyKernel<<<BLK,THRD>>>(vars...);

//Make sure nothing went wrong.
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());

Thanks to talonmies for the error checking code!

Note:
Even if you were doing a vanilla cudaMemcpy, your code would fail as you haven't cudaMalloced memory for your array -- int that case, though, the failure would likely be the GPU equivalent of a segfault (likely Unspecified launch failure) as the pointer would have some sort of junk value in it and you would be trying to write the memory with the address given by that junk value.

like image 109
Jason R. Mick Avatar answered Sep 28 '22 04:09

Jason R. Mick