In order to reduce the transfer time from host to device for my application, I want to use pinned memory. NVIDIA's best practices guide proposes mapping buffers and writing the data using the following code:
cDataIn = (unsigned char*)clEnqueueMapBuffer(cqCommandQue, cmPinnedBufIn, CL_TRUE,CL_MAP_WRITE, 0, memSize, 0, NULL, NULL, NULL);
for(unsigned int i = 0; i < memSize; i++)
{
cDataIn[i] = (unsigned char)(i & 0xff);
}
clEnqueueWriteBuffer(cqCommandQue, cmDevBufIn, CL_FALSE, 0,
szBuffBytes, cDataIn, 0, NULL, NULL);
Intel's optimization guide recommends to use calls to clEnqueueMapBuffer and clEnqueueUnmapBuffer instead of calls to clEnqueueReadBuffer or clEnqueueWriteBuffer.
What is the right way to use pinned memory/mapped memory? Is it necessary to write the data using enqueueWriteBuffer or is enqueueMapBuffer sufficient?
Also, what is the difference between CL_MEM_ALLOC_HOST_PTR and CL_MEM_USE_HOST_PTR?
This is an interesting topic that very little people detail. I will try to define exactly how it works.
The pinned memory refers to a memory that as well as being in the device, exists in the host, so a DMA write is possible between these 2 memories. Increasing the copy performance.
That is why it needs CL_MEM_ALLOC_HOST_PTR
in the buffer creation params.
On the other hand, CL_MEM_USE_HOST_PTR
will take a host pointer for buffer creation, it is unclear by the spec if this can or cannot be a pinned memory. But generally speaking, it should NOT be pinned memory created this way, since the host pointer has not been reserved by the OpenCL API and is not clear where it resides in memory.
Regarding the Map/Read question. Both are ok. And they will give same performance. The difference between the both techniques is that:
buffer+Mapped_Buffer
all along. The good thing is that you can now just clEnqueueRead/Write
to that mapped pointer. The API will wait for the pinned data to be consistent and then consider it done. It is easier to use, since it is like doing a map+unmap in one shot. The Read/Write mode is easier to use, specially for repetitive reads, but is not as versatile as the manual map option, since you CAN'T write a read only
map, nor read a write only
map. But for general use the variables that are read will never be written, and viceversa.
My understanding is that Intel recommendation, refers to "Use Map, not plain Read/Write", rather than "When you use Map, don't use Read/Write over Mapped pointers".
Did you check this nVIDIA recomendation over Intel HW? I think it should work, however I don't know if indeed the operation would be optimal (as in AMD or nVIDIA HW).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With