With reference to: Is rename() atomic?
I'm asking something similar, but not quite the same, because what I want to know is is it safe to rely on the atomicty of rename()
when using NFS?
Here's a scenario I'm dealing with - I have an 'index' file that must always be present.
So:
Separate client:
This is making the assumption that rename()
being atomic means - there will always be an 'index' file (although, it might be an out of date version, because caching and timing)
However the problem I'm hitting is this - that this is happening on NFS - and working - but several of my NFS clients are occasionally reporting "ENOENT" - no such file or directory. (e.g. in hundreds operations happening at 5m intervals, we get this error every couple of days).
So what I'm hoping is whether someone can enlighten me - should it actually be impossible to get 'ENOENT' in this scenario?
The reason I'm asking is this entry in RFC 3530:
The RENAME operation must be atomic to the client.
I'm wondering if that means just the client issuing the rename, and not the client viewing the directory? (I'm ok with a cached/out of date directory structure, but the point of this operation is that this file will always be 'present' in some form)
Sequence of operations (from the client performing the write operation) is:
21401 14:58:11 open("fleeg.ext", O_RDWR|O_CREAT|O_EXCL, 0666) = -1 EEXIST (File exists) <0.000443>
21401 14:58:11 open("fleeg.ext", O_RDWR) = 3 <0.000547>
21401 14:58:11 fstat(3, {st_mode=S_IFREG|0600, st_size=572, ...}) = 0 <0.000012>
21401 14:58:11 fadvise64(3, 0, 572, POSIX_FADV_RANDOM) = 0 <0.000008>
21401 14:58:11 fcntl(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=1, len=1}) = 0 <0.001994>
21401 14:58:11 open("fleeg.ext.i", O_RDWR|O_CREAT, 0666) = 4 <0.000538>
21401 14:58:11 fstat(4, {st_mode=S_IFREG|0600, st_size=42, ...}) = 0 <0.000008>
21401 14:58:11 fadvise64(4, 0, 42, POSIX_FADV_RANDOM) = 0 <0.000006>
21401 14:58:11 close(4) = 0 <0.000011>
21401 14:58:11 fstat(3, {st_mode=S_IFREG|0600, st_size=572, ...}) = 0 <0.000007>
21401 14:58:11 open("fleeg.ext.i", O_RDONLY) = 4 <0.000577>
21401 14:58:11 fstat(4, {st_mode=S_IFREG|0600, st_size=42, ...}) = 0 <0.000007>
21401 14:58:11 fadvise64(4, 0, 42, POSIX_FADV_RANDOM) = 0 <0.000006>
21401 14:58:11 fstat(4, {st_mode=S_IFREG|0600, st_size=42, ...}) = 0 <0.000007>
21401 14:58:11 fstat(4, {st_mode=S_IFREG|0600, st_size=42, ...}) = 0 <0.000007>
21401 14:58:11 read(4, "\3PAX\1\0\0O}\270\370\206\20\225\24\22\t\2\0\203RD\0\0\0\0\17\r\0\2\0\n"..., 42) = 42 <0.000552>
21401 14:58:11 close(4) = 0 <0.000013>
21401 14:58:11 fcntl(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=466, len=68}) = 0 <0.001418>
21401 14:58:11 pread(3, "\21@\203\244I\240\333\272\252d\316\261\3770\361#\222\200\313\224&J\253\5\354\217-\256LA\345\253"..., 38, 534) = 38 <0.000010>
21401 14:58:11 pread(3, "\21@\203\244I\240\333\272\252d\316\261\3770\361#\222\200\313\224&J\253\5\354\217-\256LA\345\253"..., 38, 534) = 38 <0.000010>
21401 14:58:11 pread(3, "\21\"\30\361\241\223\271\256\317\302\363\262F\276]\260\241-x\227b\377\205\356\252\236\211\37\17.\216\364"..., 68, 466) = 68 <0.000010>
21401 14:58:11 pread(3, "\21\302d\344\327O\207C]M\10xxM\377\2340\0319\206k\201N\372\332\265R\242\313S\24H"..., 62, 300) = 62 <0.000011>
21401 14:58:11 pread(3, "\21\362cv'\37\204]\377q\362N\302/\212\255\255\370\200\236\350\2237>7i`\346\271Cy\370"..., 104, 362) = 104 <0.000010>
21401 14:58:11 pwrite(3, "\21\302\3174\252\273.\17\v\247\313\324\267C\222P\303\n~\341F\24oh/\300a\315\n\321\31\256"..., 127, 572) = 127 <0.000012>
21401 14:58:11 pwrite(3, "\21\212Q\325\371\223\235\256\245\247\\WT$\4\227\375[\\\3263\222\0305\0\34\2049A;2U"..., 68, 699) = 68 <0.000009>
21401 14:58:11 pwrite(3, "\21\262\20Kc(!.\350\367i\253hkl~\254\335H\250.d\0036\r\342\v\242\7\255\214\31"..., 38, 767) = 38 <0.000009>
21401 14:58:11 fsync(3) = 0 <0.001007>
21401 14:58:11 fstat(3, {st_mode=S_IFREG|0600, st_size=805, ...}) = 0 <0.000009>
21401 14:58:11 open("fleeg.ext.i.tmp", O_RDWR|O_CREAT|O_TRUNC, 0666) = 4 <0.001813>
21401 14:58:11 fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 <0.000007>
21401 14:58:11 fadvise64(4, 0, 0, POSIX_FADV_RANDOM) = 0 <0.000007>
21401 14:58:11 write(4, "\3PAX\1\0\0qT2\225\226\20\225\24\22\t\2\0\205;D\0\0\0\0\17\r\0\2\0\n"..., 42) = 42 <0.000012>
21401 14:58:11 stat("fleeg.ext.i", {st_mode=S_IFREG|0600, st_size=42, ...}) = 0 <0.000011>
21401 14:58:11 fchmod(4, 0100600) = 0 <0.002517>
21401 14:58:11 fstat(4, {st_mode=S_IFREG|0600, st_size=42, ...}) = 0 <0.000008>
21401 14:58:11 close(4) = 0 <0.000011>
21401 14:58:11 rename("fleeg.ext.i.tmp", "fleeg.pax.i") = 0 <0.001201>
21401 14:58:11 close(3) = 0 <0.000795>
21401 14:58:11 munmap(0x7f1475cce000, 4198400) = 0 <0.000177>
21401 14:58:11 munmap(0x7f14760cf000, 4198400) = 0 <0.000173>
21401 14:58:11 futex(0x7f147cbcb908, FUTEX_WAKE_PRIVATE, 2147483647) = 0 <0.000010>
21401 14:58:11 exit_group(0) = ?
21401 14:58:11 +++ exited with 0 +++
NB - Paths and files renamed in the above for consistency. fleeg.ext
is the data file, and fleeg.ext.i
is the index. During this process - the fleeg.ext.i
file is being overwritten (by the .tmp
file), which is why the belief is that there should always be a file at that path (either the old one, or the new that's just overwritten it).
On the reading client the PCAP looks like LOOKUP
NFS call is what's failing:
124 1.375777 10.10.41.35 -> 10.10.41.9 NFS 226 LOOKUP fleeg.ext.i V3 LOOKUP Call, DH: 0x6fbbff3a/fleeg.ext.i
125 1.375951 10.10.41.9 -> 10.10.41.35 NFS 186 5347 LOOKUP 0775 Directory V3 LOOKUP Reply (Call In 124) Error: NFS3ERR_NOENT
126 1.375975 10.10.41.35 -> 10.10.41.9 NFS 226 LOOKUP fleeg.ext.i V3 LOOKUP Call, DH: 0x6fbbff3a/fleeg.ext.i
127 1.376142 10.10.41.9 -> 10.10.41.35 NFS 186 5347 LOOKUP 0775 Directory V3 LOOKUP Reply (Call In 126) Error: NFS3ERR_NOENT
I think the problem is not in the RENAME not being atomic, but in the fact that OPENing a file via NFS is not atomic.
NFS uses Filehandles; in order to do something to a file, a client first obtains a Filehandle through a LOOKUP, then the obtained Filehandle is used to perform the other requests. A minimum of two datagram is required, and the time between them can, in particular circumstances, be quite "large".
What is happening to you, I suppose, is that a client (client1) performs a LOOKUP; just after that, the LOOKUPed file gets erased as a result of RENAME (by client2); the Filehandle client1 has is no more valid, because it refers to an inode, not to a named path.
The reason for all this is that NFS aims to be stateless. More info in this PDF: http://pages.cs.wisc.edu/~remzi/OSTEP/dist-nfs.pdf
In pages 6 and 8 this behaviour is well explained.
Should it actually be impossible to get
ENOENT
in this scenario?
It is quite possible. The RFC 3530 says:
The operation is required to be atomic to the client.
That most likely means it must be atomic to the client invoking this operation, not all clients.
And further on it says:
If the target directory already contains an entry with the name... the existing target is removed before the rename occurs.
This is the reason other clients get ENOENT
sometimes.
In other words, rename
is not atomic on NFS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With