I just read an article that explains the zero-copy mechanism.
It talks about the difference between zero-copy with and without Scatter/Gather supports.
NIC without SG support, the data copies are as follows
NIC with SG support, the data copies are as follows
In a word, zero-copy with SG support can eliminate one CPU copy.
My question is that why data in kernel buffer could be scattered?
Because the Linux kernel's mapping / memory allocation facilities by default will create virtually-contiguous but possibly physically-disjoint memory regions.
That means the read from the filesystem which sendfile()
does internally goes to a buffer in kernel virtual memory, which the DMA code has to "transmogrify" (for lack of a better word) into something that the network card's DMA engine can grok.
Since DMA (often but not always) uses physical addresses, that means you either duplicate the data buffer (into a specially-allocated physically-contigous region of memory, your socket buffer above), or else transfer it one-physical-page-at-a-time.
If your DMA engine, on the other hand, is capable of aggregating multiple physically-disjoint memory regions into a single data transfer (that's called "scatter-gather") then instead of copying the buffer, you can simply pass a list of physical addresses (pointing to physically-contigous sub-segments of the kernel buffer, that's your aggregate descriptors above) and you no longer need to start a separate DMA transfer for each physical page. This is usually faster, but whether it can be done or not depends on the capabilities of the DMA engine.
Re: My question is that why data in kernel buffer could be scattered?
Because it already is scattered. The data queue in front of a TCP socket is not divided into the datagrams that will go out onto the network interface. Scatter allows you to keep the data where it is and not have to copy it to make a flat buffer that is acceptable to the hardware.
With the gather feature, you can give the network card a datagram which is broken into pieces at different addresses in memory, which can be references to the original socket buffers. The card will read it from those locations and send it as a single unit.
Without gather (hardware requires simple, linear buffers) a datagram has to be prepared as a contiguously allocated byte string, and all the data which belongs to it has to be memcpy
-d into place from the buffers that are queued for transmission on the socket.
Because when you write to a socket, the headers of the packet are assembled in a different place from your user-data, so to be coalesced into a network packet, the device needs "gather" capability, at least to get the headers and data.
Also to avoid the CPU having to read the data (and thus, fill its cache up with useless stuff it's never going to need again), the network card also needs to generate its own IP and TCP checksums (I'm assuming TCP here, because 99% of your bulk data transfers are going to be TCP). This is OK, because nowadays they all can.
What I'm not sure is, how this all interacts with TCP_CORK.
Most protocols tend to have their own headers, so a hypothetical protocol looks like:
Client: Send request Server: Send some metadata; send the file data
So we tend to have a server application assembling some headers in memory, issuing a write(), followed by a sendfile()-like operation. I suppose the headers still get copied into a kernel buffer in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With