Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quickest way in C# to find a file in a directory with over 20,000 files

Tags:

c#

.net

file-io

I have a job that runs every night to pull xml files from a directory that has over 20,000 subfolders under the root. Here is what the structure looks like:

rootFolder/someFolder/someSubFolder/xml/myFile.xml
rootFolder/someFolder/someSubFolder1/xml/myFile1.xml
rootFolder/someFolder/someSubFolderN/xml/myFile2.xml
rootFolder/someFolder1
rootFolder/someFolderN

So looking at the above, the structure is always the same - a root folder, then two subfolders, then an xml directory, and then the xml file. Only the name of the rootFolder and the xml directory are known to me.

The code below traverses through all the directories and is extremely slow. Any recommendations on how I can optimize the search especially if the directory structure is known?

string[] files = Directory.GetFiles(@"\\somenetworkpath\rootFolder", "*.xml", SearchOption.AllDirectories);
like image 355
adeel825 Avatar asked Apr 03 '09 14:04

adeel825


3 Answers

Rather than doing GetFiles and doing a brute force search you could most likely use GetDirectories, first to get a list of the "First sub folder", loop through those directories, then repeat the process for the sub folder, looping through them, lastly look for the xml folder, and finally searching for .xml files.

Now, as for performance the speed of this will vary, but searching for directories first, THEN getting to files should help a lot!

Update

Ok, I did a quick bit of testing and you can actually optimize it much further than I thought.

The following code snippet will search a directory structure and find ALL "xml" folders inside the entire directory tree.

string startPath = @"C:\Testing\Testing\bin\Debug";
string[] oDirectories = Directory.GetDirectories(startPath, "xml", SearchOption.AllDirectories);
Console.WriteLine(oDirectories.Length.ToString());
foreach (string oCurrent in oDirectories)
    Console.WriteLine(oCurrent);
Console.ReadLine();

If you drop that into a test console app you will see it output the results.

Now, once you have this, just look in each of the found directories for you .xml files.

like image 181
Mitchel Sellers Avatar answered Oct 18 '22 15:10

Mitchel Sellers


I created a recursive method GetFolders using a Parallel.ForEach to find all the folders named as the variable yourKeyword

List<string> returnFolders = new List<string>();
object locker = new object();

Parallel.ForEach(subFolders, subFolder =>
{
    if (subFolder.ToUpper().EndsWith(yourKeyword))
    {
        lock (locker)
        {
            returnFolders.Add(subFolder);
        }
    }
    else
    {
        lock (locker)
        {
            returnFolders.AddRange(GetFolders(Directory.GetDirectories(subFolder)));
        }
    }
});

return returnFolders;
like image 23
ceffoh Avatar answered Oct 18 '22 14:10

ceffoh


Are there additional directories at the same level as the xml folder? If so, you could probably speed up the search if you do it yourself and eliminate that level from searching.

        System.IO.DirectoryInfo root = new System.IO.DirectoryInfo(rootPath);
        List<System.IO.FileInfo> xmlFiles=new List<System.IO.FileInfo>();

        foreach (System.IO.DirectoryInfo subDir1 in root.GetDirectories())
        {
            foreach (System.IO.DirectoryInfo subDir2 in subDir1.GetDirectories())
            {
                System.IO.DirectoryInfo xmlDir = new System.IO.DirectoryInfo(System.IO.Path.Combine(subDir2.FullName, "xml"));

                if (xmlDir.Exists)
                {
                    xmlFiles.AddRange(xmlDir.GetFiles("*.xml"));
                }
            }
        }
like image 33
Adam Robinson Avatar answered Oct 18 '22 16:10

Adam Robinson