I am a little bit confused about how exactly zero-copy work.
1- Want to confirm that the following corresponds to zero-copy in opencl.
.......................
. . .
. . .
. . CPU .
. SYSTEM . .
. RAM . c3 X .
. <=====> .
...|...................
PCI-E / /
| / /
c2 |X /PCI-E, CPU directly accessing GPU memory
| / / copy c3, c2 is avoided, indicated by X.
...|...././................
. MEMORY<====> .
. OBJECT .c1 .
. . GPU .
. GPU RAM . .
. . .
...........................
.......................
. . .
. . .
. . CPU .
.SYSTEM RAM . .
. . .
. . c3 .
. MEMORY<====> .
...| OBJECT............
| \ \
PCI-E \ \PCI-E, GPU directly accessing System memory. copy c2, c1 is avoided
| \ \
C2 |X \ \
...|.........\..\...........
. | . .
. <=======> .
. GPU c1 X GPU .
. RAM . .
. . .
............................
The GPU/CPU is accessing System/GPU-RAM directly, without explicit copy.
2-What is the advantage of having this? PCI-e is still limiting the over all bandwidth. Or the only advantage is that we can avoid copies c2 & c1/c3 in above situations?
This document augments the OpenCL API specification by giving guidance specific to Intel processor graphics. Key Takeaway To create zero copy buffers, do one of the following: Use CL_MEM_ALLOC_HOST_PTRand let the runtime handle creating a zero copy allocation buffer for you
This is accomplished by eliminating extra copies during execution, referred to as zero copy behavior. This document augments the OpenCL API specification by giving guidance specific to Intel processor graphics. Key Takeaway To create zero copy buffers, do one of the following:
Zero copy: Refers to the concept of using the same copy of memory between the host, in this case the CPU, and the device, in this case the integrated GPU, with the goal of increasing performance and reducing the overall memory footprint of the application by reducing the number of copies of data.
When directly accessing any buffer on the host, zero copy buffer or not, you are required to map and unmap the buffer in OpenCL 1.2. See below and the sample code for details. Accessing the Buffer on the Device Accessing the buffer on the device is no different than any other buffer; no code change is required.
You are correct in your understanding of how zero-copy works. The basic premise is that you can access either the host memory from the device, or the device memory from the host without needing to do an intermediate buffering step in between.
You can perform zero-copy by creating buffers with the following flags:
CL_MEM_AMD_PERSISTENT_MEM //Device-Resident Memory
CL_MEM_ALLOC_HOST_PTR // Host-Resident Memory
Then, the buffers can be accessed using memory mapping semantics:
void* p = clEnqueueMapBuffer(queue, buffer, CL_TRUE, CL_MAP_WRITE, 0, size, 0, NULL, NULL, &err);
//Perform writes to the buffer p
err = clEnqueueUnmapMemObject(queue, buffer, p, 0, NULL, NULL);
Using zero-copy you could be able to achieve performance over an implementation that did the following:
Instead you could do it all in one step
On some implementations, the calls of mapping and unmapping can hide the cost of data transfer. As in our example,
If the implementation is performing this way, then there will be no benefit to using the mapping approach. However, AMDs newer drivers for OpenCL allow the data to be written directly, making the cost of mapping and unmapping almost 0. For discrete graphics cards, the requests still take place over the PCIe bus, so data transfers can be slow.
In the case of an APU architecture, however, the costs of data transfers using the zero-copy semantics can greatly increase the speed of transfers due to the APUs unique architecture (pictured below). In this architecture, the PCIe bus is replaced with the Unified North Bridge (UNB) that allows for faster transfers.
BE AWARE that when using zero-copy semantics with the memory-mapping, that you will see absolutely horrendous bandwidths when reading a device-side buffer from the host. These bandwidths are on the order of 0.01 Gb/s and can easily become a new bottleneck for your code.
Sorry if this is too much information. This was my thesis topic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With