Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA-transfer 2D array from host to device

Tags:

cuda

gpu

I have a 2D matrix in the main. I want to transfer if from host to device. Can you tell me how I can allocate memory for it and transfer it to the device memory?

#define N 5
__global__ void kernel(int a[N][N]){
}
int main(void){

    int a[N][N];
    cudaMalloc(?);
    cudaMemcpy(?);
    kernel<<<N,N>>>(?);

}

1 Answers

Perhaps something like this is what you really had in mind:

#define N 5 
__global__ void kernel(int *a)
{
    // Thread indexing within Grid - note these are
    // in column major order.
    int tidx = threadIdx.x + blockIdx.x * blockDim.x;
    int tidy = threadIdx.y + blockIdx.y * blockDim.y;

    // a_ij = a[i][j], where a is in row major order
    int a_ij = a[tidy +  tidx*N];
} 

int main(void)
{
    int a[N][N], *a_device;
    const size_t a_size = sizeof(int) * size_t(N*N);
    cudaMalloc((void **)&a_device, a_size); 
    cudaMemcpy(a_device, a, a_size, cudaMemcpyHostToDevice); 
    kernel<<<N,N>>>(a_device); 
} 

The point you might have missed is that when you statically declare an array like this A[N][N], it is really just a row major ordered piece of linear memory. The compiler is automatically converting between a[i][j] and a[j + i*N] when it emits code. On the GPU, you must use the second form of access to read the memory you copy from the host.

like image 101
talonmies Avatar answered Oct 17 '25 08:10

talonmies



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!