I am currently going through the tutorial examples on http://code.google.com/p/stanford-cs193g-sp2010/ to learn CUDA. The code which demostrates __global__
functions is given below. It simply creates two arrays, one on the CPU and one on the GPU, populates the GPU array with the number 7 and copies the GPU array data into the CPU array.
#include <stdlib.h> #include <stdio.h> __global__ void kernel(int *array) { int index = blockIdx.x * blockDim.x + threadIdx.x; array[index] = 7; } int main(void) { int num_elements = 256; int num_bytes = num_elements * sizeof(int); // pointers to host & device arrays int *device_array = 0; int *host_array = 0; // malloc a host array host_array = (int*)malloc(num_bytes); // cudaMalloc a device array cudaMalloc((void**)&device_array, num_bytes); int block_size = 128; int grid_size = num_elements / block_size; kernel<<<grid_size,block_size>>>(device_array); // download and inspect the result on the host: cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost); // print out the result element by element for(int i=0; i < num_elements; ++i) { printf("%d ", host_array[i]); } // deallocate memory free(host_array); cudaFree(device_array); }
My question is why have they worded the cudaMalloc((void**)&device_array, num_bytes);
statement with a double pointer? Even here definition of cudamalloc() on says the first argument is a double pointer.
Why not simply return a pointer to the beginning of the allocated memory on the GPU, just like the malloc
function does on the CPU?
cudaMalloc is a function that can be called from the host or the device to allocate memory on the device, much like malloc for the host. The memory allocated with cudaMalloc must be freed with cudaFree.
Double pointers can also be used when we want to alter or change the value of the pointer. In general double pointers are used if we want to store or reserve the memory allocation or assignment even outside of a function call we can do it using double pointer by just passing these functions with ** arg.
From online documentation: cudaError_t cudaMemset (void * devPtr, int value, size_t count ) Fills the first count bytes of the memory area pointed to by devPtr with the constant byte value value.
cudaMemcpy() Blocks the CPU until the copy is complete. Copy begins when all preceding CUDA calls have completed. cudaMemcpyAsync() Asynchronous, does not block the CPU.
All CUDA API functions return an error code (or cudaSuccess if no error occured). All other parameters are passed by reference. However, in plain C you cannot have references, that's why you have to pass an address of the variable that you want the return information to be stored. Since you are returning a pointer, you need to pass a double-pointer.
Another well-known function which operates on addresses for the same reason is the scanf
function. How many times have you forgotten to write this &
before the variable that you want to store the value to? ;)
int i; scanf("%d",&i);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With