Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it OK (performance-wise) to have hundreds or thousands of files in the same Linux directory?

It's well known that in Windows a directory with too many files will have a terrible performance when you try to open one of them. I have a program that is to execute only in Linux (currently it's on Debian-Lenny, but I don't want to be specific about this distro) and writes many files to the same directory (which acts somewhat as a repository). By "many" I mean tens each day, meaning that after one year I expect to have something like 5000-10000 files. They are meant to be kept (once a file is created, it's never deleted) and it is assumed that the hard disk has the required capacity (if not, it should be upgraded). Those files have a wide range of sizes, from a few KB to tens of MB (but not much more than that). The names are always numeric values, incrementally generated. I'm worried about long-term performance degradation, so I'd ask:

  • Is it OK to write all to the same directory? Or should I think about creating a set of subdirectories for every X files?
  • Should I require a specific filesystem to be used for such directory?
  • What would be the more robust alternative? Specialized filesystem? Which?
  • Any other considerations/recomendations?
like image 339
Fabio Ceconello Avatar asked Jan 05 '12 00:01

Fabio Ceconello


People also ask

How many files is too many Linux?

By default, the number of files for one process of this is limited to 1024. Multiplying 1024 * 5041 gives us 5161984 – this is the maximum number of open files by all user processes.

How many files is too many in a directory?

You can put 4,294,967,295 files into a single folder if drive is formatted with NTFS (would be unusual if it were not) as long as you do not exceed 256 terabytes (single file size and space) or all of disk space that was available whichever is less.

How many files can be stored in a folder in Linux?

It depends on the file system. ext3 suppport ~32000 subdirectories (not files!) in a given directory, with ext4 it's 64000 by default. xfs has no limit to my knowledge. Ext4 has no limit if the dir_index and dir_nlink features are set.

What offers good performance for large file system in Linux?

v Overall we could recommend XFS as the best file system for large multiprocessor systems because it is fast at low CPU utilization. EXT3 is the best second choice.


2 Answers

  • Is it OK to write all to the same directory? Or should I think about creating a set of subdirectories for every X files?

In my experience the only slow down a directory with many files will give is if you do things such as getting a listing with ls. But that mostly is the fault of ls, there are faster ways of listing the contents of a directory using tools such as echo and find (see below).

  • Should I require a specific filesystem to be used for such directory?

I don't think so with regards to amount of files in one directory. I am sure some filesystems perform better with many small files in one dir whilst others do a better job on huge files. It's also a matter of personal taste, akin to vi vs. emacs. I prefer to use the XFS filesystem so that'd be my advice. :-)

  • What would be the more robust alternative? Specialized filesystem? Which?

XFS is definitely robust and fast, I use it in many places, as boot partition, oracle tablespaces, space for source control you name it. It lacks a bit on delete performance, but otherwise it's a safe bet. Plus it supports growing the size whilst it is still mounted (that's a requirement actually). That is you just delete the partition, recreate it at the same starting block and whatever ending block that's larger than the original partition, then you run xfs_growfs on it with the filesystem mounted.

  • Any other considerations/recomendations?

See above. With the addition that having 5000 to 10000 files in one directory should not be a problem. In practice it doesn't arbitrarily slow down the filesystem as far as I know, except for utilities such as "ls" and "rm". But you could do:

find * | xargs echo
find * | xargs rm

The benefit that a directory tree with files, such as directory "a" for file names starting with an "a" etc., will give you is that of looks, it looks more organised. But then you have less of an overview... So what you're trying to do should be fine. :-)

I neglected to say you could consider using something called "sparse files" http://en.wikipedia.org/wiki/Sparse_file

like image 60
aseq Avatar answered Mar 05 '23 13:03

aseq


It depends very much on the file system.

ext2 and ext3 have a hard limit of 32,000 files per directory. This is somewhat more than you are asking about, but close enough that I would not risk it. Also, ext2 and ext3 will perform a linear scan every time you access a file by name in the directory.

ext4 supposedly fixes these problems, but I cannot vouch for it personally.

XFS was designed for this sort of thing from the beginning and will work well even if you put millions of files in the directory.

So if you really need a huge number of files, I would use XFS or maybe ext4.

Note that no file system will make "ls" run fast if you have an enormous number of files (unless you use "ls -f"), since "ls" will read the entire directory and the sort the names. A few tens of thousands is probably not a big deal, but a good design should scale beyond what you think you need at first glance...

For the application you describe, I would probably create a hierarchy instead, since it is hardly any additional coding or mental effort for someone looking at it. Specifically, you can name your first file "00/00/01" instead of "000001".

like image 37
Nemo Avatar answered Mar 05 '23 12:03

Nemo