Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA Double pointer memory copy [duplicate]

Tags:

cuda

I wrote my sample code like this.

int ** d_ptr;
cudaMalloc( (void**)&d_ptr, sizeof(int*)*N );

int* tmp_ptr[N];
for(int i=0; i<N; i++)
    cudaMalloc( (void**)&tmp_ptr[i], sizeof(int)*SIZE );
cudaMemcpy(d_ptr, tmp_ptr, sizeof(tmp_ptr), cudaMemcpyHostToDevice);

And this code works well but after kernel launching I can't receive the result.

int* Mtx_on_GPU[N];
cudaMemcpy(Mtx_on_GPU, d_ptr, sizeof(int)*N*SIZE, cudaMemcpyDeviceToHost);

At this point, segment-fault-error occurs. But I don't know what I'm wrong.

int* Mtx_on_GPU[N];
for(int i=0; i<N; i++)
    cudaMemcpy(Mtx_on_GPU[i], d_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);

This code have also same error.

I think certainly my code has some mistakes but I can't find it during all daytime.

Give me some advice.

like image 907
Umbrella Avatar asked May 12 '14 12:05

Umbrella


People also ask

What is the use of cudaMalloc?

What cudaMalloc() does is that it allocates a memory pointer (with space) on GPU which is then pointed by the first argument we give.

Is cudaMalloc synchronous?

Yes, cudaMalloc and cudaFree are blocking and synchronize across all kernels executing on the current GPU.

What is cudaMallocHost?

cudaMallocHost: Allocates page-locked memory on the host in duncantl/RCUDA: R Bindings for the CUDA Library for GPU Computing.

What is cudaMemset?

From online documentation: cudaError_t cudaMemset (void * devPtr, int value, size_t count ) Fills the first count bytes of the memory area pointed to by devPtr with the constant byte value value.


1 Answers

In the last line

cudaMemcpy(Mtx_on_GPU[i], d_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);

you are trying to copy data from the device to the host (NOTE: I assume that you allocated host memory for the Mtx_on_GPU pointers!)

However, the pointers are stored in device memory, so you can't access the directly from host side. The line should be

cudaMemcpy(Mtx_on_GPU[i], temp_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);

This may become clearer when using "overly elaborate" variable names:

int ** devicePointersStoredInDeviceMemory;
cudaMalloc( (void**)&devicePointersStoredInDeviceMemory, sizeof(int*)*N);

int* devicePointersStoredInHostMemory[N];
for(int i=0; i<N; i++)
    cudaMalloc( (void**)&devicePointersStoredInHostMemory[i], sizeof(int)*SIZE );

cudaMemcpy(
    devicePointersStoredInDeviceMemory, 
    devicePointersStoredInHostMemory,
    sizeof(int*)*N, cudaMemcpyHostToDevice);

// Invoke kernel here, passing "devicePointersStoredInDeviceMemory"
// as an argument
...

int* hostPointersStoredInHostMemory[N];
for(int i=0; i<N; i++) {
    int* hostPointer = hostPointersStoredInHostMemory[i]; 
    // (allocate memory for hostPointer here!)

    int* devicePointer = devicePointersStoredInHostMemory[i];

    cudaMemcpy(hostPointer, devicePointer, sizeof(int)*SIZE, cudaMemcpyDeviceToHost);
}

EDIT in response to the comment:

The d_ptr is "an array of pointers". But the memory of this array is allocated with cudaMalloc. That means that it is located on the device. In contrast to that, with int* Mtx_on_GPU[N]; you are "allocating" N pointers in host memory. Instead of specifying the array size, you could also have used malloc. It may become clearer when you compare the following allocations:

int** pointersStoredInDeviceMemory;
cudaMalloc((void**)&pointersStoredInDeviceMemory, sizeof(int*)*N);

int** pointersStoredInHostMemory;
pointersStoredInHostMemory = (void**)malloc(N * sizeof(int*));

// This is not possible, because the array was allocated with cudaMalloc:
int *pointerA = pointersStoredInDeviceMemory[0];

// This is possible because the array was allocated with malloc:    
int *pointerB = pointersStoredInHostMemory[0];

It may be a little bit brain-twisting to keep track of

  • the type of the memory where the pointers are stored
  • the type of the memory that the pointers are pointing to

but fortunately, it hardly becomes more than 2 indirections.

like image 74
Marco13 Avatar answered Sep 29 '22 02:09

Marco13