CUDA Double pointer memory copy [duplicate]

Tags:

cuda

I wrote my sample code like this.

int ** d_ptr;
cudaMalloc( (void**)&d_ptr, sizeof(int*)*N );

int* tmp_ptr[N];
for(int i=0; i<N; i++)
    cudaMalloc( (void**)&tmp_ptr[i], sizeof(int)*SIZE );
cudaMemcpy(d_ptr, tmp_ptr, sizeof(tmp_ptr), cudaMemcpyHostToDevice);

And this code works well but after kernel launching I can't receive the result.

int* Mtx_on_GPU[N];
cudaMemcpy(Mtx_on_GPU, d_ptr, sizeof(int)*N*SIZE, cudaMemcpyDeviceToHost);

At this point, segment-fault-error occurs. But I don't know what I'm wrong.

int* Mtx_on_GPU[N];
for(int i=0; i<N; i++)
    cudaMemcpy(Mtx_on_GPU[i], d_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);

This code have also same error.

I think certainly my code has some mistakes but I can't find it during all daytime.

Give me some advice.

907

asked May 12 '14 12:05

1 Answers

In the last line

cudaMemcpy(Mtx_on_GPU[i], d_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);

you are trying to copy data from the device to the host (NOTE: I assume that you allocated host memory for the Mtx_on_GPU pointers!)

However, the pointers are stored in device memory, so you can't access the directly from host side. The line should be

cudaMemcpy(Mtx_on_GPU[i], temp_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);

This may become clearer when using "overly elaborate" variable names:

int ** devicePointersStoredInDeviceMemory;
cudaMalloc( (void**)&devicePointersStoredInDeviceMemory, sizeof(int*)*N);

int* devicePointersStoredInHostMemory[N];
for(int i=0; i<N; i++)
    cudaMalloc( (void**)&devicePointersStoredInHostMemory[i], sizeof(int)*SIZE );

cudaMemcpy(
    devicePointersStoredInDeviceMemory, 
    devicePointersStoredInHostMemory,
    sizeof(int*)*N, cudaMemcpyHostToDevice);

// Invoke kernel here, passing "devicePointersStoredInDeviceMemory"
// as an argument
...

int* hostPointersStoredInHostMemory[N];
for(int i=0; i<N; i++) {
    int* hostPointer = hostPointersStoredInHostMemory[i]; 
    // (allocate memory for hostPointer here!)

    int* devicePointer = devicePointersStoredInHostMemory[i];

    cudaMemcpy(hostPointer, devicePointer, sizeof(int)*SIZE, cudaMemcpyDeviceToHost);
}

EDIT in response to the comment:

The d_ptr is "an array of pointers". But the memory of this array is allocated with cudaMalloc. That means that it is located on the device. In contrast to that, with int* Mtx_on_GPU[N]; you are "allocating" N pointers in host memory. Instead of specifying the array size, you could also have used malloc. It may become clearer when you compare the following allocations:

int** pointersStoredInDeviceMemory;
cudaMalloc((void**)&pointersStoredInDeviceMemory, sizeof(int*)*N);

int** pointersStoredInHostMemory;
pointersStoredInHostMemory = (void**)malloc(N * sizeof(int*));

// This is not possible, because the array was allocated with cudaMalloc:
int *pointerA = pointersStoredInDeviceMemory[0];

// This is possible because the array was allocated with malloc:    
int *pointerB = pointersStoredInHostMemory[0];

It may be a little bit brain-twisting to keep track of

the type of the memory where the pointers are stored
the type of the memory that the pointers are pointing to

but fortunately, it hardly becomes more than 2 indirections.

answered Sep 29 '22 02:09

Marco13

Related questions
                            
                                Amount of local memory per CUDA thread
                            
                                How to use constant memory for beginners (Cuda C)
                            
                                Armadillo+NVBLAS into RcppArmadillo+NVBLAS
                            
                                Difference between @cuda.jit and @jit(target='gpu')
                            
                                cudaMemcpy transfer kinds: Default vs HostToDevice/DeviceToHost
                            
                                How to use only one GPU for tensorflow session?
                            
                                Can I run a Docker container with CUDA 10 when host has CUDA 9?
                            
                                OpenCL examples with benchmarks
                            
                                PyCUDA Passing variable by value to kernel
                            
                                CUDA warps and occupancy
                            
                                CUDA __syncthreads() usage within a warp
                            
                                Simple adding of two int's in Cuda, result always the same
                            
                                Bound CUDA texture reads zero
                            
                                Invalid device symbol when copying to CUDA constant memory
                            
                                cannot find -lcuda when linking with g++
                            
                                how can I use cudaStreamAddCallback() with a class member method?
                            
                                How to separate the kernel file CUDA with the main .cpp file
                            
                                thrust reduction result on device memory
                            
                                CUDA invalid device symbol error
                            
                                Using CUDA printf outside the kernel to print device variables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CUDA Double pointer memory copy [duplicate]

Tags:

cuda

Umbrella

People also ask

1 Answers

Marco13

Recent Activity

Donate For Us