Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Apache recommend against using sendfile() with NFS on Linux

The Apache documentation includes this statement for EnableSendfile:

With a network-mounted DocumentRoot (e.g., NFS, SMB, CIFS, FUSE), the kernel may be unable to serve the network file through its own cache.[1]

The default configuration for Apache 2.4 and Nginx disables sendfile().

I'm trying to find something concrete that describes what the exact problem is when using sendfile() with NFS filesystems on Linux. Running a minimal test program on kernel 3.10.0-327.36.3 (CentOS 7) verifies that sendfile() does work when the source is on NFS, and it does read from the page cache (first time run is slow, subsequently fast, drop_caches to make it slow again, i.e. re-read from source). I tried with file sizes up to 1G and everything seemed to work OK. I'm assuming there must be some set of circumstances that reveals buggy behaviour, but I'd like to know exactly what that is.

For comparison, there's some documentation out there about the problems VirtualBox volumes have with sendfile()[2], but I can't find something similar covering Apache, or how to replicate a problematic configuration.

  • [1] https://httpd.apache.org/docs/2.4/mod/core.html#enablesendfile
  • [2] https://www.virtualbox.org/ticket/12597
like image 634
protospark Avatar asked Sep 22 '17 14:09

protospark


1 Answers

The default configuration for Nginx turns sendfile on - https://github.com/nginx/nginx/blob/release-1.13.8/conf/nginx.conf#L27 so I'm confused about your statement there.

Way back in the early 2000s you can see an Apache dev introducing the option to disable SendFile (and here's the mailing list post for the patch). There are also old bugs that might have been related to sendfile over in the Apache bug tracker. From Apache bug #12893 we learn that one of the failures seen was because the Linux in-kernel NTFS implementation simply didn't support the sendfile syscall at all:

[...] apparently there is some characteristic of your NTFS filesystem that prevents sendfile() from working.

sendfile(8, 9, [0], 9804)               = -1 EINVAL (Invalid argument)

A blog post titled "The Mysterious Case of Sendfile and Apache" referencing the stackoverflow question you're reading puts forward the following theory:

sendfile() will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred. (This is true on both 32-bit and 64-bit systems.)

There's a 2GB limitation. Now here's the assumption, the apache documentation says:

With a network-mounted DocumentRoot (e.g., NFS, SMB, CIFS, FUSE), the kernel may be unable to serve the network file through its own cache[2]

So when it says 'the kernel may be unable to serve the file' I think we might be referring here to the inherent limit on file size that sendfile has.

Interesting theory but I doubt this is the answer because you could simply choose not to use the sendfile code path on files that are too big. Update: while digging around I found the author of that post created a follow-up titled That Time I Was Wrong About Sendfile() and Apache which mentions the answer you're reading!

There are also warnings about sendfile problems in the ProFTPD documentation:

There have been cases where it was the filesystems, rather than the kernels, which appeared to have been the culprits in sendfile(2) problems:

  • Network filesystems (e.g NFS, SMBFS/Samba, CIFS)
  • Virtualized filesystems (OpenVZ, VMware, and even Veritas)
  • Other filesystems (e.g. NTFS and tmpfs on Linux)

Again, if you encounter issues with downloading files from ProFTPD when those files reside on a networked or virtualized filesystem, try using "UseSendfile off" in your proftpd.conf.

A lot of "here be dragons" warnings. Some of these will be because the filesystem simply didn't support sendfile (e.g. until 2.4.22-pre3 Linux's tmpfs didn't support sendfile). FUSE based filesystems (such as NTFS-3g) would have also had problems in the past due to FUSE and sendfile bugs (since ironed out). The list of virtualized filesystems is an interesting addition though...

However the OrangeFS FAQ seems to have the most plausible explanation:

5.16 Can we run the Apache webserver to serve files off a orangefs volume?

Sure you can! However, we recommend that you turn off the EnableSendfile option in httpd.conf before starting the web server. Alternatively, you could configure orangefs with the option -enable-kernel-sendfile. Passing this option to configure results in a orangefs kernel module that supports the sendfile callback. But we recommend that unless the files that are being served are large enough this may not be a good idea in terms of performance. Apache 2.x+ uses the sendfile system call that normally stages the file-data through the page-cache. On recent 2.6 kernels, this can be averted by providing a sendfile callback routine at the file-system. Consequently, this ensures that we don't end up with stale or inconsistent cached data on such kernels. However, on older 2.4 kernels the sendfile system call streams the data through the page-cache and thus there is a real possibility of the data being served stale. Therefore users of the sendfile system call are warned to be wary of this detail.

A similar explanation can be read in the Linux guest readv system call returns stale (cached) shared folder file data Virtualbox bug:

I have discovered that programs that read files using the read system call return the correct data, but those using the readv system call (such as my version of gas) read stale cached data.

[...]

the use of kernel function generic_file_read_iter as the .read_iter member of the file_operations structure (.read_iter is used when doing a readv system call). This function WILL write to and read from the file cache. However, vbox function sf_reg_read, as used for the generic .read member and read system call, appears to always bypass Linux's FS cache.

[...]

Further I believe that a similar long-lived issue is reported as ticket #819, only for the sendfile system call. It seems that all of these generic_file_* functions have the expectation that the host controls all access to the drive.

The above may explain ProFTPD's list of problem virtualization filesystems too.

Summary (best guess)

Apache recommend against using sendfile() with Linux NFS because their software is popular and triggered many painful to debug sendfile related bugs with older Linux NFS clients. The warning is old and it's probably easier to leave it as-is rather than update it with all the caveats.

If you have a Linux filesystem where the underlying data can be changed without invalidating the Linux page cache it's unwise to use sendfile with it on old Linux kernels (this explains old Linux NFS client issues). With newer kernels if the aforementioned filesystem doesn't implement its own sendfile hook yet again using sendfile is unwise (the Virtualbox shared folder issue demonstrates this).

Recent (2.6.31 and above) Linux kernels provide the facility for filesystems that might face this invalidation problem to use their own sendfile implementation and assuming the filesystem does it should be fine to use with sendfile barring bugs but caveat emptor!

like image 54
Anon Avatar answered Sep 29 '22 15:09

Anon