How to retrieve a list of directories QUICKLY in Java?

Tags:

Suppose a very simple program that lists out all the subdirectories of a given directory. Sound simple enough? Except the only way to list all subdirectories in Java is to use FilenameFilter combined with File.list().

This works for the trivial case, but when the folder has say 150,000 files and 2 sub folders, it's silly waiting there for 45 seconds iterating through all the files and testing for file.isDirectory(). Is there a better way to list sub directories??

PS. Sorry, please save the lectures on having too many files in the same directory. Our live environment has this as part of the requirement.

923

asked Jun 23 '09 20:06

erotsppa

3 Answers

As has already been mentioned, this is basicly a hardware problem. Disk access is always slow, and most file systems aren't really designed to handle directories with that many files.

If you for some reason have to store all the files in the same directory, I think you'll have to maintain your own cache. This could be done using a local database such as sqlite, HeidiSQL or HSQL. If you want extreme performance, use a java TreeSet and cache it in memory. This means at the very least that you'll have to read the directory less often, and it could possibly be done in the background. You could reduce the need to refresh the list even further by using your systems native file update notification API (inotify on linux) to subscribe to changes to the directory.

This doesn't seem to be possible for you, but I once solved a similiar problem by "hashing" the files into subdirectories. In my case, the challenge was to store a couple of millions images with numeric ids. I constructed the directory structure as follows:

images/[id - (id % 1000000)]/[id - (id % 1000)]/[id].jpg

This has worked well for us, and it's the solution that I would recommend. You could do something similiar to alpha-numeric filenames by simply taking the first two letters of the filename, and then the next two letters. I've done this as well once, and it did the job as well.

answered Oct 16 '22 21:10

Emil H

Do you know the finite list of possible subdirectory names? If so, use a loop over all possible names and check for directory's existence.

Otherwise, you can not get ONLY directory names in most underlying OSs (e.g. in Unix, the directory listing is merely reading contents of "directory" file, so there's no way to find "just directories" quickly without listing all the files).

However, in NIO.2 in Java7 (see http://java.sun.com/developer/technicalArticles/javase/nio/#3 ), there's a way to have a streaming directory list so you don't get a full array of file elements cluttering your memory/network.

answered Oct 16 '22 20:10

DVK

There's actually a reason why you got the lectures: it's the correct answer to your problem. Here's the background, so that perhaps you can make some changes in your live environment.

First: directories are stored on the filesystem; think of them as files, because that's exactly what they are. When you iterate through the directory, you have to read those blocks from the disk. Each directory entry will require enough space to hold the filename, and permissions, and information on where that file is found on-disk.

Second: directories aren't stored with any internal ordering (at least, not in the filesystems where I've worked with directory files). If you have 150,000 entries and 2 sub-directories, those 2 sub-directory references could be anywhere within the 150,000. You have to iterate to find them, there's no way around that.

So, let's say that you can't avoid the big directory. Your only real option is to try to keep the blocks comprising the directory file in the in-memory cache, so that you're not hitting the disk every time you access them. You can achieve this by regularly iterating over the directory in a background thread -- but this is going to cause undue load on your disks, and interfere with other processes. Alternatively, you can scan once and keep track of the results.

The alternative is to create a tiered directory structure. If you look at commercial websites, you'll see URLs like /1/150/15023.html -- this is meant to keep the number of files per directory small. Think of it as a BTree index in a database.

Of course, you can hide that structure: you can create a filesystem abstraction layer that takes filenames and automatically generates the directory tree where those filenames can be found.

answered Oct 16 '22 22:10

kdgregory

Related questions
                            
                                Calling a JAVA method from C++ with JNI, no parameters
                            
                                Setting java.system.class.loader programmatically [duplicate]
                            
                                How to programmatically change the selection within package explorer
                            
                                How do I get database instance in grails?
                            
                                Releasing resources of Java 7 WatchService
                            
                                Set size of JLabel in FlowLayout
                            
                                Java - XMLGregorianCalendar Exception
                            
                                Disable taglib scanning in google app engine (Jetty)
                            
                                Why is a default constructor required for objects exported by a JAX-WS?
                            
                                Generate XSD files in Java
                            
                                How to install Xuggler?
                            
                                Will java compiler optimize a method call on a final static variable ? And what happens when it becomes dalvik code?
                            
                                How to Optimise a JPA Query
                            
                                Operation of RandomAccess in AbstractList.java
                            
                                IOUtils.copy and IOUtils.copyLarge are very slow
                            
                                Long incidental Young garbage collection pauses
                            
                                Java Rest - send parametrized List using GET method
                            
                                Implement search textField into jTable
                            
                                Get number of lines in property using resourcecount
                            
                                What are existing practices for encryption / decryption / key rotation for Java Web apps

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to retrieve a list of directories QUICKLY in Java?

Tags:

java

performance

file-io

filesystems

erotsppa

People also ask

3 Answers

Emil H

DVK

kdgregory

Recent Activity

Donate For Us