Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreaded Directory Looping in C#

Tags:

c#

.net

recursion

I am trying to loop through all files and folders and perform an action on all files that have a certain extension. This method works fine, but I would like to make it multithreaded because when done over tens of thousands of files, it is really slow and I would imaging using multithreading would speed things up. I am just unsure about how to use threading in this case.

doStuff reads properties (date modified, etc. from the files and inserts them into a sqlite database. I am starting a transaction before the scan method is called so that is optimized as much as it can be.

Answers that provide the theory on how to do it are just as good as full working code answers.

    private static string[] validTypes = { ".x", ".y", ".z", ".etc" };
    public static void scan(string rootDirectory)
    {
        try
        {

            foreach (string dir in Directory.GetDirectories(rootDirectory))
            {

                if (dir.ToLower().IndexOf("$recycle.bin") == -1)
                    scan(dir);
            }

            foreach (string file in Directory.GetFiles(rootDirectory))
            {

                if (!((IList<string>)validTypes).Contains(Path.GetExtension(file)))
                {
                    continue;
                }


                doStuff(file);
            }
        }
        catch (Exception)
        {
        }
    }
like image 665
Alec Gorge Avatar asked Jan 22 '23 17:01

Alec Gorge


1 Answers

Assuming that doStuff is thread-safe, and that you don't need to wait for the entire scan to finish, you can call both doStuff and scan on the ThreadPool, like this:

string path = file;
ThreadPool.QueueUserWorkItem(delegate { doStuff(path); });

You need to make a separate local variable because the anonymous method would have capture the file variable itself, and would see changes to it throughout the loop. (In other words, if the ThreadPool only executed the task after the loop continued to the next file, it would process the wrong file)

However, reading your comment, the main issue here is disk IO, so I suspect that multithreading will not help much.

Note that Directory.GetFiles will perform slowly for directories with large numbers of files. (Since it needs to allocate an array to hold of the filenames)
If you're using .Net 4.0, you can make it faster by calling the EnumerateFiles method instead, which uses an iterator to return a IEnumerable<string> that enumerates the directory as you run your loop.
You can also avoid the recursive scan calls with either method by passing the SearchOption parameter, like this:

foreach (string file in Directory.EnumerateFiles(rootDirectory, "*", SearchOption.AllDirectories))

This will recursively scan all subdirectories, so you'll only need a single foreach loop.
Note that this will exacerbate the performance issues with GetFiles, so you may want to avoid this pre-.Net 4.0.

like image 80
SLaks Avatar answered Feb 01 '23 20:02

SLaks