Does Linux have zero-copy? splice or sendfile?

Tags:

When splice was introduced it was discussed on the kernel list that sendfile was re-implemented based off of splice. The documentation for splice SLICE_F_MOVE states:

Attempt to move pages instead of copying. This is only a hint to the kernel: pages may still be copied if the kernel cannot move the pages from the pipe, or if the pipe buffers don't refer to full pages. The initial implementation of this flag was buggy: therefore starting in Linux 2.6.21 it is a no-op (but is still permitted in a splice() call); in the future, a correct implementation may be restored.

So does that mean Linux has no zero-copy method for writing to sockets? Or was this fixed at some point and nobody updated the documentation for years? Does either of sendfile or splice have a zero copy implementation in any of the latest 3.x kernel versions?

Since Google has no answer to this query, I'm creating a stackoverflow question for the next poor schmuck who wants to know if there's any benefit to using vmsplice and splice or sendfile over plain old write.

842

asked Jun 17 '14 00:06

Eloff

2 Answers

sendfile has been ever since, and still is zero-copy (assuming the hardware allows for it, but that is usually the case). Being zero-copy was the entire point of having this syscall in the first place. sendfile is nowadays implemented as a wrapper around splice.

That suggests that splice, too, is zero-copy, and this is indeed the case. At least in theory, and at least in some cases. The problem is figuring out how to correctly use it so it works reliably and so it is zero-copy. The documentation is... sparse, to say the least.

In particular, splice only works zero-copy if the pages were given as "gift", i.e. you don't own them any more (formally, but in reality you still do). That is a non-issue if you simply splice a file descriptor onto a socket, but it is a big issue if you want to splice data from your application's address space, or from one pipe to another. It is unclear what to do with the pages afterwards (and when). The documentation states that you may not touch the pages afterwards or do anything with them, never, not ever. So if you follow the letter of the documentation, you must leak the memory.
That's obviously not correct (it can't be), but there is no good way of knowing (for you at least!) when it's safe to reuse or release that memory. The kernel doing a sendfile would know, since as soon as it receives the TCP ACK, it knows that the data is never needed again. The problem is, you don't ever get to see an ACK. All you know when splice has returned is that data has been accepted to be sent (but you have no idea whether it has already been sent or received, nor when this will happen).
Which means you need to figure this out somehow on an application layer, either by doing manual ACKs (comes for free with reliable UDP), or by assuming that if the other side sends an answer to your request, they obviously must have gotten the request.

Another thing you have to manage is the finite pipe space. The default is very small, but even if you increase the size, you can't just naively splice a file of any size. sendfile on the other hand will just let you do that, which is cool.

All in all, sendfile is nice because it just works, and it works well, and you don't need to care about any of the above details. It's not a panacea, but it sure is a great addition.
I would, personally, stay away from splice and its family until the whole thing is greatly overhauled and until it is 100% clear what you have to do (and when) and what you don't have to do.

The real, effective gains over plain old write are marginal for most applications, anyway. I recall some less than polite comments by Mr. Torvalds a few years ago (when BSD had a form of write that would do some magic with remapping pages to get zero-copy, and Linux didn't) which pointed out that making a copy usually isn't any issue, but playing tricks with pages is [won't repeat that here].

198

answered Sep 19 '22 00:09

Damon

According to the relevant man page on splice as of 2014-07-08 I quote:

Though we talk of copying, actual copies are generally avoided. The kernel does this by implementing a pipe buffer as a set of reference-counted pointers to pages of kernel memory. The kernel creates "copies" of pages in a buffer by creating new pointers (for the output buffer) referring to the pages, and increasing the reference counts for the pages: only pointers are copied, not the pages of the buffer.

Therefore, yes, splice is documented to be currently zero copy in most cases.

answered Sep 20 '22 00:09

Vality

Related questions
                            
                                How can I link with (or work around) two third-party static libraries that define the same symbols?
                            
                                Fixed-width Floating-Point Numbers in C/C++
                            
                                How do I automate finding unused #include directives?
                            
                                Memoization Libraries for C?
                            
                                Is &*p valid C, given that p is a pointer to an incomplete type?
                            
                                Can clang-format align a block of #defines for me?
                            
                                How to write a hash function in C?
                            
                                What can cause a Java native function (in C) to segfault upon entry?
                            
                                What do the numbers mean in the preprocessed .i files when compiling C with gcc?
                            
                                Why this macro is defined as ({ 1; })?
                            
                                How does this C program without libc work?
                            
                                What is the nfds from select() used for
                            
                                Why is there now a difference between "{static const char a[]={...}" and "{const char a[]={...}"?
                            
                                Looking for a disk-based B+ tree implementation in C++ or C [closed]
                            
                                Read line from file without knowing the line length
                            
                                Add compiler option without editing Makefile
                            
                                To infinity and back
                            
                                What is the best way of determining that two file paths are referring to the same file object?
                            
                                array offset calculations in multi dimensional array (column vs row major)
                            
                                How do you read directly from physical memory?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does Linux have zero-copy? splice or sendfile?

Tags:

c

linux

zero-copy

Eloff

People also ask

2 Answers

Damon

Vality

Recent Activity

Donate For Us