Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More efficient method of getting Directory size

I've already build a recursive function to get the directory size of a folder path. It works, however with the growing number of directories I have to search through (and number of files in each respective folder), this is a very slow, inefficient method.

static string GetDirectorySize(string parentDir)
{
    long totalFileSize = 0;

    string[] dirFiles = Directory.GetFiles(parentDir, "*.*", 
                            System.IO.SearchOption.AllDirectories);

    foreach (string fileName in dirFiles)
    {
        // Use FileInfo to get length of each file.
        FileInfo info = new FileInfo(fileName);
        totalFileSize = totalFileSize + info.Length;
    }
    return String.Format(new FileSizeFormatProvider(), "{0:fs}", totalFileSize);
}

This is searches all subdirectories for the argument path, so the dirFiles array gets quite large. Is there a better method to accomplish this? I've searched around but haven't found anything yet.

Another idea that crossed my mind was putting the results in a cache and when the function is called again, try and find the differences and only re-search folders that have changed. Not sure if that's a good thing either...

like image 230
ikathegreat Avatar asked Mar 22 '12 22:03

ikathegreat


People also ask

How do you find the size of the files from a directory?

Right-click the file and click Properties. The image below shows that you can determine the size of the file or files you have highlighted from in the file properties window. In this example, the chrome. jpg file is 18.5 KB (19,032 bytes), and that the size on disk is 20.0 KB (20,480 bytes).

Which view help us to show the size of a folder?

Right-click on the folder you want to view the size in File Explorer. Select “Properties.” The File Properties dialogue box will appear displaying the folder “Size” and its “Size on disk.” It will also show the file contents of those particular folders.

How do I get the size of a directory in C#?

To calculate the size of a folder in C#, use the Directory. EnumerateFiles Method and get the files. Creates all directories and subdirectories in the specified path unless they already exist. Creates all the directories in the specified path, unless the already exist, applying the specified Windows security.


4 Answers

You are first scanning the tree to get a list of all files. Then you are reopening every file to get its size. This amounts to scanning twice.

I suggest you use DirectoryInfo.GetFiles which will hand you FileInfo objects directly. These objects are pre-filled with their length.

In .NET 4 you can also use the EnumerateFiles method which will return you a lazy IEnumable.

like image 176
usr Avatar answered Oct 02 '22 06:10

usr


This is more cryptic but it took about 2 seconds for 10k executions.

    public static long GetDirectorySize(string parentDirectory)
    {
        return new DirectoryInfo(parentDirectory).GetFiles("*.*", SearchOption.AllDirectories).Sum(file => file.Length);
    }
like image 21
MrFox Avatar answered Oct 02 '22 06:10

MrFox


Try

        DirectoryInfo DirInfo = new DirectoryInfo(@"C:\DataLoad\");
        Stopwatch sw = new Stopwatch();
        try
        {
            sw.Start();
            Int64 ttl = 0;
            Int32 fileCount = 0;
            foreach (FileInfo fi in DirInfo.EnumerateFiles("*", SearchOption.AllDirectories))
            {
                ttl += fi.Length;
                fileCount++;
            }
            sw.Stop();
            Debug.WriteLine(sw.ElapsedMilliseconds.ToString() + " " + fileCount.ToString());
        }
        catch (Exception Ex)
        {
            Debug.WriteLine(Ex.ToString());
        }

This did 700,000 in 70 seconds on desktop NON-RAID P4. So like 10,000 a second. On server class machine should get 100,000+ / second easy.

As usr (+1) said EnumerateFile is pre-filled with length.

like image 25
paparazzo Avatar answered Oct 01 '22 06:10

paparazzo


You may start to speed up a little bit your function using EnumerateFiles() instead of GetFiles(). At least you won't load the full list in memory.

If it's not enough you should make your function more complex using threads (one thread per directory is too much but there is not a general rule).
You may use a fixed number of threads that peeks directories from a queue, each thread calculates the size of a directory and adds to the total. Something like:

  • Get the list of all directories (not files).
  • Create N threads (one per core, for example).
  • Each thread peeks a directory and calculate the size.
  • If there is not another directory in the queue the thread ends.
  • If there is a directory in the queue it calculates its size and so on.
  • Function finishes when all threads terminate.

You may improve a lot the algorithm spanning the search of directories across all threads (for example when a thread parse a directory it adds folders to the queue). Up to you to make it more complicated if you see it's too slow (this task has been used by Microsoft as an example for the new Task Parallel Library).

like image 41
Adriano Repetti Avatar answered Oct 02 '22 06:10

Adriano Repetti