Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is better for performance - many files in one directory, or many subdirectories each with one file?

While building web applications often we have files associated with database entries, eg: we have a user table and each category has a avatar field, which holds the path to associated image.

To make sure there are no conflicts in filenames we can either:

  • rename files upon upload to ID.jpg; the path would be then /user-avatars/ID.jpg
  • or create a sub-directory for each entity, and leave the original filename intact; the path would be then /user-avatars/ID/original_filename.jpg

where ID is users's unique ID number

Both perfectly valid from application logic's point of view.

But which one would be better from filesystem performance point of view? We have to keep in mind that the number of category entries can be very high (milions).

Is there any limit to a number of sub-directories a directory can hold?

like image 463
ioleo Avatar asked Jul 24 '13 09:07

ioleo


People also ask

How many files is too many in a directory?

You can put 4,294,967,295 files into a single folder if drive is formatted with NTFS (would be unusual if it were not) as long as you do not exceed 256 terabytes (single file size and space) or all of disk space that was available whichever is less.

How many subdirectories can a directory have?

What is the maximum number of subdirectories that a single directory might contain? Subdirectories might be limited by the number of available inodes and maxdirsize setting. There is a limit of 99,998 directories per sub-directory.

What is the purpose of subdirectories?

A subdirectory is a type of website hierarchy under a root domain that uses folders to organize content on a website. A subdirectory is the same as a subfolder and the names can be used interchangeably.

What is the difference between directory and subdirectory?

Files are organized by storing related files in the same directory. In a hierarchical file system (that is, one in which files and directories are organized in a manner that resembles a tree), a directory contained inside another directory is called a subdirectory.


2 Answers

It's going to depend on your file system, but I'm going to assume you're talking about something simple like ext3, and you're not running a distributed file system (some of which are quite good at this). In general, file systems perform poorly over a certain number of entries in a single directory, regardless of whether those entries are directories or files. So no matter whether if you're creating one directory per image or one image in the root directory, you will run into scaling problems. If you look at this answer:

How many files in a directory is too many (on Windows and Linux)?

You'll see that ext3 runs into limits at about 32K entries in a directory, far fewer than you're proposing.

Off the top of my head, I'd suggest doing some rudimentary sharding into a multilevel directory tree, something like /user-avatars/1/2/12345/original_filename.jpg. (Or something appropriate for your type of ID, but I am interpreting your question to be about numeric IDs.) Doing that will also make your life easier later when you decide you want to distribute across a storage cluster, since you can spread the directories around.

like image 130
aleatha Avatar answered Oct 21 '22 21:10

aleatha


Millions of entries (either files or directories) in one parent directory would be hard to deal with for any filesystem. While modern filesystems use sorting and various tree algorithms for quick search for the needed files, even navigating to the folder with Windows Explorer or Midnight Commander or any other file manager will be complicated as the file manager would have to read contents of the directory. The same applies to file search. So subdirectories are preferred for this.

Yet I need to notice that access to particular file would be a bit faster when all files are in one directory than when they are separated into subdirectories at least on NTFS (measured this myself several times with 400K files).

like image 36
Eugene Mayevski 'Callback Avatar answered Oct 21 '22 22:10

Eugene Mayevski 'Callback