CUDA Device To Device transfer expensive

Tags:

I have written some code to try to swap quadrants of a 2D matrix for FFT purposes, that is stored in a flat array.

    int leftover = W-dcW;

    T *temp;
    T *topHalf;
cudaMalloc((void **)&temp, dcW * sizeof(T));

    //swap every row, left and right
    for(int i = 0; i < H; i++)
    {
        cudaMemcpy(temp, &data[i*W], dcW*sizeof(T),cudaMemcpyDeviceToDevice);
        cudaMemcpy(&data[i*W],&data[i*W+dcW], leftover*sizeof(T), cudaMemcpyDeviceToDevice);
        cudaMemcpy(&data[i*W+leftover], temp, dcW*sizeof(T), cudaMemcpyDeviceToDevice); 
    }

cudaMalloc((void **)&topHalf, dcH*W* sizeof(T));
    leftover = H-dcH;
    cudaMemcpy(topHalf, data, dcH*W*sizeof(T), cudaMemcpyDeviceToDevice);
    cudaMemcpy(data, &data[dcH*W], leftover*W*sizeof(T), cudaMemcpyDeviceToDevice);
    cudaMemcpy(&data[leftover*W], topHalf, dcH*W*sizeof(T), cudaMemcpyDeviceToDevice);

Notice that this code takes device pointers, and does DeviceToDevice transfers.

Why does this seem to run so slow? Can this be optimized somehow? I timed this compared to the same operation on host using regular memcpy and it was about 2x slower.

Any ideas?

384

asked May 19 '11 19:05

Derek

1 Answers

I ended up writing a kernel to do the swaps. This was indeed faster than the Device to Device memcpy operations

answered Sep 18 '22 20:09

Derek

Related questions
                            
                                Weak/Strong Reference Pointer Relationship
                            
                                C++: What is the proper way of resizing a dynamically allocated array?
                            
                                Split a wstring by specified separator
                            
                                How to get collision detection of circle and triangle
                            
                                Inheritance vs aggregation and "has-a" vs "is-a".
                            
                                Boost regex runtime error
                            
                                assignment operator return a reference to *this in C++
                            
                                Is this a valid use of intrusive_ptr?
                            
                                What happens if I didn't call delete operator after allocating data using new and ending program?
                            
                                OpenAL: how to play multiple sounds at the same time and mix them?
                            
                                How can I grasp the concept of pure OOD?
                            
                                Ambiguous Reference/Value Versions of Functions
                            
                                Properly Overloading new/delete new[]/delete[]
                            
                                C++ Macro Expander
                            
                                Is it possible to obtain an intermediate C code from Objective-C?
                            
                                c++ what's the result of iterator + integer when past-end-iterator?
                            
                                Convert existing C++ (.h and .cpp) files to java for Android [closed]
                            
                                How do you delete a cvseq in OpenCV?
                            
                                How to SWIG in VS2010?
                            
                                Simple C++ Inheritance Example, What's wrong? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CUDA Device To Device transfer expensive

Tags:

c++

cuda

fft

Derek

People also ask

1 Answers

Derek

Recent Activity

Donate For Us