Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterating a DirectoryStream and changing contents of a directory at the same time

The documentation of DirectoryStream clearly states:

The iterator is weakly consistent. It is thread safe but does not freeze the directory while iterating, so it may (or may not) reflect updates to the directory that occur after the DirectoryStream is created.

On my machine, I executed a simple iteration over a directory in debug mode. Before the iteration completed, I broke execution, added a file to the directory being iterated and resumed. The iteration did not see the extra file.

My question: under what circumstance will the iteration reflect updates to the directory contents? Unfortunately the formal documentation is very vague about it. To say the least.

like image 674
Vitaliy Avatar asked Aug 08 '13 17:08

Vitaliy


2 Answers

The documentation is intentionally vague. The JVM has to run on a number of different kinds of machines: Windows and Unix-derivatives. Different file systems have different behaviors. You must (I repeat, MUST) design for the worst case if you want your program to work reliably on more than one computer.

The law of least surprise suggests that you should slurp up the entire DirectoryStream to get a snapshot (or very close to one), iterate over the snapshot, and then re-slurp the stream. You can then compare the different versions of the snapshots to determine changes to the underlying directory.

like image 82
Bob Dalgleish Avatar answered Nov 09 '22 06:11

Bob Dalgleish


As DirectoryStream is an interface, and as this part of NIO.2 is intended to be pluggable, don't limit your consideration to the implementations that ship with the JDK for Linux and Windows. It would be quite possible to write a custom implementation with exactly that behaviour, or for a clustered or distributed implementation to have that behaviour as a side-effect.

The documentation is intentionally vague, and under POSIX it delegates to readdir which is also intentionally vague:

If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir_r() returns an entry for that file is unspecified.

However, if you're after a concrete case where an implementation relied on that vagueness, then Linux ext3 readdir and concurrent updates shows a case where rsync, on an ext3 file system with high volume, appeared to see files appear in the directory outside the order they were created in.

like image 29
Joe Avatar answered Nov 09 '22 04:11

Joe