I've implemented (in Java) a fairly straightforward Iterator to return the names of the files in a recursive directory structure, and after about 2300 files it failed "Too many open files in system" (the failure was actually in trying to load a class, but I assume the directory listing was the culprit).
The data structure maintained by the iterator is a Stack holding the contents of the directories that are open at each level.
The actual logic is fairly basic:
private static class DirectoryIterator implements Iterator<String> {
private Stack<File[]> directories;
private FilenameFilter filter;
private Stack<Integer> positions = new Stack<Integer>();
private boolean recurse;
private String next = null;
public DirectoryIterator(Stack<File[]> directories, boolean recurse, FilenameFilter filter) {
this.directories = directories;
this.recurse = recurse;
this.filter = filter;
positions.push(0);
advance();
}
public boolean hasNext() {
return next != null;
}
public String next() {
String s = next;
advance();
return s;
}
public void remove() {
throw new UnsupportedOperationException();
}
private void advance() {
if (directories.isEmpty()) {
next = null;
} else {
File[] files = directories.peek();
while (positions.peek() >= files.length) {
directories.pop();
positions.pop();
if (directories.isEmpty()) {
next = null;
return;
}
files = directories.peek();
}
File nextFile = files[positions.peek()];
if (nextFile.isDirectory()) {
int p = positions.pop() + 1;
positions.push(p);
if (recurse) {
directories.push(nextFile.listFiles(filter));
positions.push(0);
advance();
} else {
advance();
}
} else {
next = nextFile.toURI().toString();
count++;
if (count % 100 == 0) {
System.err.println(count + " " + next);
}
int p = positions.pop() + 1;
positions.push(p);
}
}
}
}
I would like to understand how many "open files" this requires. Under what circumstances is this algorithm "opening" a file, and when does it get closed again?
I've seen some neat code using Java 7 or Java 8, but I'm constrained to Java 6.
When you call nextFile.listFiles(), an underlying file descriptor is opened to read the directory. There is no way to explicitly close this descriptor, so you are relying on garbage collection. As your code descends a deep tree, it is essentially collecting a stack of nextFile instances which can't be garbaged collected.
Step 1: set nextFile = null before calling advance(). This releases the object for garbage collection.
Step 2: you may need to call System.gc() after nulling nextFile to encourage quick garbage collection. Unfortunately, there is no way to force GC.
Step 3: you may need to increase the open file limit on your operating system. On Linux this may be done with ulimit(1).
If you can migrate to Java 7 or later, then DirectoryStream will solve your problem. Instead of using nextFile.listFiles(), use Files.newDirectoryStream(nextFile.toPath()) to get a DirectoryStream. You can then iterate over the stream and then close() it to release the operating system resources. Each returned path can be converted back to a file with toFile(). However you might like to refactor to use just Path instead of File.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With