I have a directory with 500,000 files in it. I would like to access them as quickly as possible. The algorithm requires me to repeatedly open and close them (can't have 500,000 file open simultaneously).
How can I do that efficiently? I had originally thought that I could cache the inodes and open the files that way, but *nix doesn't provide a way to open files by inode (security or some such).
The other option is to just not worry about it and hope the FS does good job on file look up in a directory. If that is the best option, which FS's would work best. Do certain filename patterns look up faster than others? eg 01234.txt vs foo.txt
BTW this is all on Linux.
You can put 4,294,967,295 files into a single folder if drive is formatted with NTFS (would be unusual if it were not) as long as you do not exceed 256 terabytes (single file size and space) or all of disk space that was available whichever is less.
Maximum number of files on disk: 4,294,967,295. Maximum number of files in a single folder: 4,294,967,295.
Most modern filesystems do ok with that many files. Once you hit 32k files in a directory some filesystems such as ext3 will start having serious performance issues.
rsync in this benchmark case is faster than rm -rf : web.archive.org/web/20130929001850/http://linuxnote.net/… Great explanation. Magma is liquid hot by definition. It's still a great example of a better file destruction method.
Assuming your file system is ext3, your directory is indexed with a hashed B-Tree if dir_index is enabled. That's going to give you as much a boost as anything you could code into your app.
If the directory is indexed, your file naming scheme shouldn't matter.
http://lonesysadmin.net/2007/08/17/use-dir_index-for-your-new-ext3-filesystems/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With