Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do network file systems pre-fetch ? (Or: Do Internet File System make optimizations to reduce round trips)

Take the following code snippit:

 f = open("/mnt/remoteserver/bar/foo.bin", O_RDONNLY);
 while (true)
 {
       byteseread = read(f, buffer, 1000);
       if (bytesread > 0)
           ProcessBytes(buffer, bytesread);
       else
           break;
  }

If the example above, let's say the remote file, foo.bin is 1MB and has never been accessed by the client before. So, that's approximately 1000 calls to "read" to get the entire file.

Further, let's say the server with the directory mounted on the client is over the internet and not local. Fast bandwidth to the client, but with long latency.

Does every "read" call invoke a round trip back to the server to ask for more data? Or does the client/server protocol recognize that subsequent reads on a remote file are often sequential, and as such, subsequent blocks are pushed down before the application has actually made a read() call for it. Hence, subsequent read calls return faster because the data was pre-fetched and cached.

Do modern network file system protocols (NFS, SMB/Samba, any others?) make any optimizations like this. Are there network file system protocols tuned for the internet that have optimizations like this?

I'm investigating a personal project that may involve implementation of a network file system over the internet. It struck me that performance may be faster if the number of round trips could be reduced for file i/o.

like image 613
selbie Avatar asked Oct 25 '22 16:10

selbie


1 Answers

This is going to be very protocol implementation dependent. In general, I don't think most client implementations prefetch, but most savvy storage admins use large blocksizes (32+kb see the rsize/wsize mount options), which effectively results in the same thing. Network file systems are typically going to be cached via the systems buffer cache as well, so you'll definitely not be translating read() calls directly to network IO.

My advice would be to be to write your program naively (or a simple test case) and get comfortable reading the network stats via nfsstat, etc, and then optimize from there. There's far too many variables to get the answer any other way.

I'm no expert, but from what I can tell NFS4 has more WAN optimizations than the older protocols (nfs2,3,cifs) so I'd definitely factor it into your mix. That said, most remote filesystem protocols aren't really designed for high latency access which is why we end up with systems like S3, which are.

like image 159
easel Avatar answered Oct 27 '22 11:10

easel