Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Show progress when searching all files in a directory

I previously asked the question Get all files and directories in specific path fast in order to find files as fastest as possible. I am using that solution in order to find the file names that match a regular expression.

I was hoping to show a progress bar because with some really large and slow hard drives it still takes about 1 minute to execute. That solution I posted on the other link does not enable me to know how many more files are missing to be traversed in order for me to show a progress bar.

One solution that I was thinking about doing was trying to obtain the size of the directory that I was planing traversing. For example when I right click on the folder C:\Users I am able to get an estimate of how big that directory is. If I am able to know the size then I will be able to show the progress by adding the size of every file that I find. In other words the progress = (current sum of file sizes) / directory size

For some reason I have not been able to efficiently get the size of that directory.

Some of the questions on stack overflow use the following approach:

enter image description here

But note that I get an exception and are not able to enumerate the files. I am curios in trying that method on my c drive.

On that picture I was trying to count the number of files in order to show a progress. I will probably not going to be able to get the number of files efficiently using that approach. I where just trying some of the answers on stack overflow when people asked how to get the number of files on a directory and also people asked how the get the size f a directory.

like image 589
Tono Nam Avatar asked Sep 12 '12 00:09

Tono Nam


1 Answers

Solving this is going to leave you with one of a few possibilities...

  1. Not displaying a progress
  2. Using an up-front cost to compute (like Windows)
  3. Performing the operation while computing the cost

If the speed is that important and you expect large directory trees I would lean to the last of these options. I've added an answer on the linked question Get all files and directories in specific path fast that demonstrates a faster means of counting files and sizes than you are currently using. To combine this into a multi-threaded piece of code for option #3, the following can be performed...

static void Main()
{
    const string directory = @"C:\Program Files";
    // Create an enumeration of the files we will want to process that simply accumulates these values...
    long total = 0;
    var fcounter = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
    fcounter.RaiseOnAccessDenied = false;
    fcounter.FileFound +=
        (o, e) =>
            {
                if (!e.IsDirectory)
                {
                    Interlocked.Increment(ref total);
                }
            };

    // Start a high-priority thread to perform the accumulation
    Thread t = new Thread(fcounter.Find)
        {
            IsBackground = true, 
            Priority = ThreadPriority.AboveNormal, 
            Name = "file enum"
        };
    t.Start();

    // Allow the accumulator thread to get a head-start on us
    do { Thread.Sleep(100); }
    while (total < 100 && t.IsAlive);

    // Now we can process the files normally and update a percentage
    long count = 0, percentage = 0;
    var task = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
    task.RaiseOnAccessDenied = false;
    task.FileFound +=
        (o, e) =>
            {
                if (!e.IsDirectory)
                {
                    ProcessFile(e.FullPath);
                    // Update the percentage complete...
                    long progress = ++count * 100 / Interlocked.Read(ref total);
                    if (progress > percentage && progress <= 100)
                    {
                        percentage = progress;
                        Console.WriteLine("{0}% complete.", percentage);
                    }
                }
            };

    task.Find();
}

The FindFile class implementation can be found at FindFile.cs.

Depending on how expensive your file-processing task is (the ProcessFile function above) you should see a very clean progression of the progress on large volumes of files. If your file-processing is extremely fast, you may want to increase the lag between the start of enumeration and start of processing.

The event argument is of type FindFile.FileFoundEventArgs and is a mutable class so be sure you don't keep a reference to the event argument as it's values will change.

Ideally you will want to add error handling and probably the ability to abort both enumerations. Aborting the enumeration can be done by setting "CancelEnumeration" on the event argument.

like image 146
csharptest.net Avatar answered Nov 15 '22 07:11

csharptest.net