Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Linux have zero-copy? splice or sendfile?

Tags:

c

linux

zero-copy

When splice was introduced it was discussed on the kernel list that sendfile was re-implemented based off of splice. The documentation for splice SLICE_F_MOVE states:

Attempt to move pages instead of copying. This is only a hint to the kernel: pages may still be copied if the kernel cannot move the pages from the pipe, or if the pipe buffers don't refer to full pages. The initial implementation of this flag was buggy: therefore starting in Linux 2.6.21 it is a no-op (but is still permitted in a splice() call); in the future, a correct implementation may be restored.

So does that mean Linux has no zero-copy method for writing to sockets? Or was this fixed at some point and nobody updated the documentation for years? Does either of sendfile or splice have a zero copy implementation in any of the latest 3.x kernel versions?

Since Google has no answer to this query, I'm creating a stackoverflow question for the next poor schmuck who wants to know if there's any benefit to using vmsplice and splice or sendfile over plain old write.

like image 842
Eloff Avatar asked Jun 17 '14 00:06

Eloff


People also ask

What is zero copy in Linux?

"Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one memory area to another or in which unnecessary data copies are avoided.

What is sendfile?

In computing, sendfile is a command which can be found in a number of contexts relating to data transmission: Sendfile (Unix), a push-based asynchronous file transfer, regardless of whether local or remote, using the Simple Asynchronous File Transfer (SAFT), an Internet protocol bound to TCP port 487.


2 Answers

sendfile has been ever since, and still is zero-copy (assuming the hardware allows for it, but that is usually the case). Being zero-copy was the entire point of having this syscall in the first place. sendfile is nowadays implemented as a wrapper around splice.

That suggests that splice, too, is zero-copy, and this is indeed the case. At least in theory, and at least in some cases. The problem is figuring out how to correctly use it so it works reliably and so it is zero-copy. The documentation is... sparse, to say the least.

In particular, splice only works zero-copy if the pages were given as "gift", i.e. you don't own them any more (formally, but in reality you still do). That is a non-issue if you simply splice a file descriptor onto a socket, but it is a big issue if you want to splice data from your application's address space, or from one pipe to another. It is unclear what to do with the pages afterwards (and when). The documentation states that you may not touch the pages afterwards or do anything with them, never, not ever. So if you follow the letter of the documentation, you must leak the memory.
That's obviously not correct (it can't be), but there is no good way of knowing (for you at least!) when it's safe to reuse or release that memory. The kernel doing a sendfile would know, since as soon as it receives the TCP ACK, it knows that the data is never needed again. The problem is, you don't ever get to see an ACK. All you know when splice has returned is that data has been accepted to be sent (but you have no idea whether it has already been sent or received, nor when this will happen).
Which means you need to figure this out somehow on an application layer, either by doing manual ACKs (comes for free with reliable UDP), or by assuming that if the other side sends an answer to your request, they obviously must have gotten the request.

Another thing you have to manage is the finite pipe space. The default is very small, but even if you increase the size, you can't just naively splice a file of any size. sendfile on the other hand will just let you do that, which is cool.

All in all, sendfile is nice because it just works, and it works well, and you don't need to care about any of the above details. It's not a panacea, but it sure is a great addition.
I would, personally, stay away from splice and its family until the whole thing is greatly overhauled and until it is 100% clear what you have to do (and when) and what you don't have to do.

The real, effective gains over plain old write are marginal for most applications, anyway. I recall some less than polite comments by Mr. Torvalds a few years ago (when BSD had a form of write that would do some magic with remapping pages to get zero-copy, and Linux didn't) which pointed out that making a copy usually isn't any issue, but playing tricks with pages is [won't repeat that here].

like image 198
Damon Avatar answered Sep 19 '22 00:09

Damon


According to the relevant man page on splice as of 2014-07-08 I quote:

Though we talk of copying, actual copies are generally avoided. The kernel does this by implementing a pipe buffer as a set of reference-counted pointers to pages of kernel memory. The kernel creates "copies" of pages in a buffer by creating new pointers (for the output buffer) referring to the pages, and increasing the reference counts for the pages: only pointers are copied, not the pages of the buffer.

Therefore, yes, splice is documented to be currently zero copy in most cases.

like image 32
Vality Avatar answered Sep 20 '22 00:09

Vality