GPUDirect RDMA transfer from GPU to remote host

Scenario:

I have two machines, a client and a server, connected with Infiniband. The server machine has an NVIDIA Fermi GPU, but the client machine has no GPU. I have an application running on the GPU machine that uses the GPU for some calculations. The result data on the GPU is never used by the server machine, but is instead sent directly to the client machine without any processing. Right now I'm doing a cudaMemcpy to get the data from the GPU to the server's system memory, then sending it off to the client over a socket. I'm using SDP to enable RDMA for this communication.

Question:

Is it possible for me to take advantage of NVIDIA's GPUDirect technology to get rid of the cudaMemcpy call in this situation? I believe I have the GPUDirect drivers correctly installed, but I don't know how to initiate the data transfer without first copying it to the host.

My guess is that it isn't possible to use SDP in conjunction with GPUDirect, but is there some other way to initiate an RDMA data transfer from the server machine's GPU to the client machine?

Bonus: If somone has a simple way to test if I have the GPUDirect dependencies correctly installed that would be helpful as well!

858

asked Aug 14 '12 10:08

DaoWen

1 Answers

Yes, it is possible with supporting networking hardware. See the GPUDirect RDMA documentation.

answered Sep 22 '22 16:09

harrism

Related questions
                            
                                Using random numbers with GPUs
                            
                                Can I call CUDA runtime function from C++ code not compiled by nvcc?
                            
                                nvcc -Xptxas –v compiler flag has no effect
                            
                                Negative array indexing in shared memory based 1d stencil CUDA implementation
                            
                                Anyone know whether Nvidia's GPUs are big or little-endian?
                            
                                Templated CUDA kernel with dynamic shared memory
                            
                                Solving dense linear systems AX = B with CUDA
                            
                                ffmpeg ERROR: libnpp not found in windows
                            
                                How can I add up two 2d (pitched) arrays using nested for loops?
                            
                                CUDA host and device using same __constant__ memory
                            
                                How do I set CUDA architecture to compute_50 and sm_50 from cmake (3.10 version)?
                            
                                Using assert within kernel invocation
                            
                                what is "SASS" short for? [closed]
                            
                                trying to install pycuda, getting zip error?
                            
                                How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications?
                            
                                Enable code indexing of Cuda in Clion
                            
                                How to get the assembly code of a CUDA kernel?
                            
                                Redefinitions when compiling CUDA with clang on Windows
                            
                                Why does CUDA float program get faster in full speed FP64 mode?
                            
                                Why does cuFFT performance suffer with overlapping inputs?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

GPUDirect RDMA transfer from GPU to remote host

Tags:

cuda

infiniband

gpudirect

rdma