CUDA Zero Copy memory considerations

1 Answers

Memory transfer is an important factor when it comes to the performance of CUDA applications. cudaMallocHost can do two things:

allocate pinned memory: this is page-locked host memory that the CUDA runtime can track. If host memory allocated this way is involved in cudaMemcpy as either source or destination, the CUDA runtime will be able to perform an optimized memory transfer.
allocate mapped memory: this is also page-locked memory that can be used in kernel code directly as it is mapped to CUDA address space. To do this you have to set the cudaDeviceMapHost flag using cudaSetDeviceFlags before using any other CUDA function. The GPU memory size does not limit the size of mapped host memory.

I'm not sure about the performance of the latter technique. It could allow you to overlap computation and communication very nicely.

If you access the memory in blocks inside your kernel (i.e. you don't need the entire data but only a section) you could use a multi-buffering method utilizing asynchronous memory transfers with cudaMemcpyAsync by having multiple-buffers on the GPU: compute on one buffer, transfer one buffer to host and transfer one buffer to device at the same time.

I believe your assertions about the usage scenario are correct when using cudaDeviceMapHost type of allocation. You do not have to do an explicit copy but there certainly will be an implicit copy that you don't see. There's a chance it overlaps nicely with your computation. Note that you might need to synchronize the kernel call to make sure the kernel finished and that you have the modified content in h_p.

answered Sep 17 '22 17:09

Sebastian

Related questions
                            
                                How to retrieve the Interface ID of a COM class so that it can be passed to CoCreateInstance?
                            
                                How to do compare and increment atomically?
                            
                                how to compile boost thread library
                            
                                Symbol Not Found, expected in Flat Namespace ObjC++
                            
                                GCC, -O2, and bitfields - is this a bug or a feature?
                            
                                Visitor and templated virtual methods
                            
                                Returning after throwing exceptions
                            
                                C++ template function gets erronous default values
                            
                                How to get an accurate 1ms Timer Tick under WinXP
                            
                                Check COM Interface still alive?
                            
                                How do I receive raw, layer 2 packets in C/C++?
                            
                                Is there an allocator that uses alloca and is otherwise C++ STL compliant?
                            
                                Exception handling doesn't work with Qt on Windows
                            
                                What good libraries are there for solving a system of non-linear equations in C++?
                            
                                C/C++ library for platform-independent binary file I/O
                            
                                Second argument to std::vector
                            
                                Extending a class and maintaining binary backward compatibility
                            
                                Up to date OpenGL tutorials or books for c++ [closed]
                            
                                Linking object files built using different versions of GCC
                            
                                Parsing LPTSTR* command line args with boost::program_options

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CUDA Zero Copy memory considerations

Tags:

c++

cuda

Derek

People also ask

1 Answers

Sebastian

Recent Activity

Donate For Us