Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quicker (quickest?) way to get number of files in a directory with over 200,000 files

Tags:

.net

file-io

I have some directories containing test data, typically over 200,000 small (~4k) files per directory.

I am using the following C# code to get the number of files in a directory:

int fileCount = System.IO.Directory.GetFiles(@"C:\SomeDirectory").Length;

This is very, very slow however - are there any alternatives that I can use?

Edit

Each folder contains data for one day, and we will have around 18 months of directories (~550 directories). I am also very interested in performance enhancements people have found by reworking flat directory structures to more nested ones.

like image 348
Richard Ev Avatar asked Jul 28 '09 09:07

Richard Ev


People also ask

How can I quickly count files in a folder?

To count all the files and directories in the current directory and subdirectories, type dir *. * /s at the prompt.

How do we get the count of number of files in a directory?

To determine how many files there are in the current directory, put in ls -1 | wc -l. This uses wc to do a count of the number of lines (-l) in the output of ls -1.


2 Answers

The code you've got is slow because it first gets an array of all the available files, then takes the length of that array.

However, you're almost certainly not going to find any solutions that work much faster than that.

Why?

Access controls.

Each file in a directory may have an access control list - which may prevent you from seeing the file at all.

The operating system itself can't just say "hey, there are 100 file entries here" because some of them may represent files you're not allowed to know exist - they shouldn't be shown to you at all. So the OS itself has to iterate over the files, checking access permissions file by file.

For a discussion that goes into more detail around this kind of thing, see two posts from The Old New Thing:

  • Why doesn't the file system have a function that tells you the number of files in a directory?
  • Why doesn't Explorer show recursive directory size as an optional column?

[As an aside, if you want to improve performance of a directory containing a lot of files, limit yourself to strictly 8.3 filenames. No I'm not kidding - it's faster, because the OS doesn't have to generate an 8.3 filename itself, and because the algorithm used is braindead. Try a benchmark and you'll see.]

like image 96
Bevan Avatar answered Oct 28 '22 16:10

Bevan


FYI, .NET 4 includes a new method, Directory.EnumerateFiles, that does exactly what you need is awesome. Chances are you're not using .NET 4, but it's worth remembering anyway!

Edit: I now realise that the OP wanted the NUMBER of files. However, this method is so useful I'm keeping this post here.

like image 36
Richard Szalay Avatar answered Oct 28 '22 18:10

Richard Szalay