It appears that nio's .list
returns a stream which when consumed, holds on to one file descriptor per file iterated, until .close
is called on the entire stream. This means that data directories with upwards of 1,000 files can easily brush against common ulimit
values. The overall effect of this file descriptor accumulation, further exacerbates when dealing with nested traversals.
What might be an alternative way to iterate over the files of large directories, other than going down to spawning calls to the OS file list command? It would be cool if iterating the files of a large directory, a file descriptor would be maintained only per the currently iterated file, as implied by proper stream semantics.
Edit:
list
returns a java Stream of java.nio.file.Path
Which api call would be used for closing each item on the stream once it's been processed, rather than only when the entire stream is being closed, for leaner iteration? In scala, this can be easily fiddled using the api wrapper from better-files, leading from here.
If that happens why not to use old school java.io.File?
File folder = new File(pathToFolder);
String[] files = folder.list();
tested with lsof
and it looks like no of the listed files is open. You can convert the array to a list or stream afterwards. Unless the directory is too large or remote, then I would try to blame Path objects and garbage-collect or somehow destroy them.
I ran into the same issue (on Windows Server 2012 R2) when I didn't close the stream. All the files I iterated over were open in read mode until the JVM was shut down. However, it did not occur on Mac OS X and since the stream depends on OS-dependent implementations of FileSystemProvider
and DirectoryStream
, I assume the issue can be OS-dependent, too.
Contrary to the @Ian McLaird comment, it is mentioned in the Files.list()
documentation that
If timely disposal of file system resources is required, the try-with-resources construct should be used to ensure that the stream's close method is invoked after the stream operations are completed.
The returned stream is a DirectoryStream
, whose Javadoc says:
A DirectoryStream is opened upon creation and is closed by invoking the close method. Closing a directory stream releases any resources associated with the stream. Failure to close the stream may result in a resource leak.
My solution was to follow the advice and use the try-with-resources
construct
try (Stream<Path> fileListing = Files.list(directoryPath)) {
// use the fileListing stream
}
When I closed the stream properly (used the above try-with-resources
construct), the file handles were immediately released.
If you don't care about getting the files as a stream or you are OK with loading the whole file list into memory and convert it to a stream yourself, you can use the IO API:
File directory = new File("/path/to/dir");
File[] files = directory.listFiles();
if (files != null) { // 'files' can be null if 'directory' "does not denote a directory, or if an I/O error occurs."
// use the 'files' array or convert to a stream:
Stream<File> fileStream = Arrays.stream(files);
}
I did not experience any file-locking issues with this one. However, note that both solutions rely on native, OS-dependent code, so I advise testing in all environments you would be using.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With