I have written some code to try to swap quadrants of a 2D matrix for FFT purposes, that is stored in a flat array.
int leftover = W-dcW;
T *temp;
T *topHalf;
cudaMalloc((void **)&temp, dcW * sizeof(T));
//swap every row, left and right
for(int i = 0; i < H; i++)
{
cudaMemcpy(temp, &data[i*W], dcW*sizeof(T),cudaMemcpyDeviceToDevice);
cudaMemcpy(&data[i*W],&data[i*W+dcW], leftover*sizeof(T), cudaMemcpyDeviceToDevice);
cudaMemcpy(&data[i*W+leftover], temp, dcW*sizeof(T), cudaMemcpyDeviceToDevice);
}
cudaMalloc((void **)&topHalf, dcH*W* sizeof(T));
leftover = H-dcH;
cudaMemcpy(topHalf, data, dcH*W*sizeof(T), cudaMemcpyDeviceToDevice);
cudaMemcpy(data, &data[dcH*W], leftover*W*sizeof(T), cudaMemcpyDeviceToDevice);
cudaMemcpy(&data[leftover*W], topHalf, dcH*W*sizeof(T), cudaMemcpyDeviceToDevice);
Notice that this code takes device pointers, and does DeviceToDevice transfers.
Why does this seem to run so slow? Can this be optimized somehow? I timed this compared to the same operation on host using regular memcpy and it was about 2x slower.
Any ideas?
Before we can use CUDA streams, we need to understand the notion of device synchronization. This is an operation where the host blocks any further execution until all operations issued to the GPU (memory transfers and kernel executions) have completed.
– Pinned memory are virtual memory pages that are specially marked so that. they cannot be paged out. – Allocated with a special system API function call. – a.k.a. Page Locked Memory, Locked Pages, etc. – CPU memory that serve as the source or destination of a DMA transfer must.
Global memory can be considered the main memory space of the GPU in CUDA. It is allocated, and managed, by the host, and it is accessible to both the host and the GPU, and for this reason the global memory space can be used to exchange data between the two.
Data transfers using host pinned memory use the same cudaMemcpy() syntax as transfers with pageable memory.
I ended up writing a kernel to do the swaps. This was indeed faster than the Device to Device memcpy operations
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With