I have a job that runs every night to pull xml files from a directory that has over 20,000 subfolders under the root. Here is what the structure looks like:
rootFolder/someFolder/someSubFolder/xml/myFile.xml
rootFolder/someFolder/someSubFolder1/xml/myFile1.xml
rootFolder/someFolder/someSubFolderN/xml/myFile2.xml
rootFolder/someFolder1
rootFolder/someFolderN
So looking at the above, the structure is always the same - a root folder, then two subfolders, then an xml directory, and then the xml file. Only the name of the rootFolder and the xml directory are known to me.
The code below traverses through all the directories and is extremely slow. Any recommendations on how I can optimize the search especially if the directory structure is known?
string[] files = Directory.GetFiles(@"\\somenetworkpath\rootFolder", "*.xml", SearchOption.AllDirectories);
Rather than doing GetFiles and doing a brute force search you could most likely use GetDirectories, first to get a list of the "First sub folder", loop through those directories, then repeat the process for the sub folder, looping through them, lastly look for the xml folder, and finally searching for .xml files.
Now, as for performance the speed of this will vary, but searching for directories first, THEN getting to files should help a lot!
Update
Ok, I did a quick bit of testing and you can actually optimize it much further than I thought.
The following code snippet will search a directory structure and find ALL "xml" folders inside the entire directory tree.
string startPath = @"C:\Testing\Testing\bin\Debug";
string[] oDirectories = Directory.GetDirectories(startPath, "xml", SearchOption.AllDirectories);
Console.WriteLine(oDirectories.Length.ToString());
foreach (string oCurrent in oDirectories)
Console.WriteLine(oCurrent);
Console.ReadLine();
If you drop that into a test console app you will see it output the results.
Now, once you have this, just look in each of the found directories for you .xml files.
I created a recursive method GetFolders
using a Parallel.ForEach
to find all the folders named as the variable yourKeyword
List<string> returnFolders = new List<string>();
object locker = new object();
Parallel.ForEach(subFolders, subFolder =>
{
if (subFolder.ToUpper().EndsWith(yourKeyword))
{
lock (locker)
{
returnFolders.Add(subFolder);
}
}
else
{
lock (locker)
{
returnFolders.AddRange(GetFolders(Directory.GetDirectories(subFolder)));
}
}
});
return returnFolders;
Are there additional directories at the same level as the xml folder? If so, you could probably speed up the search if you do it yourself and eliminate that level from searching.
System.IO.DirectoryInfo root = new System.IO.DirectoryInfo(rootPath);
List<System.IO.FileInfo> xmlFiles=new List<System.IO.FileInfo>();
foreach (System.IO.DirectoryInfo subDir1 in root.GetDirectories())
{
foreach (System.IO.DirectoryInfo subDir2 in subDir1.GetDirectories())
{
System.IO.DirectoryInfo xmlDir = new System.IO.DirectoryInfo(System.IO.Path.Combine(subDir2.FullName, "xml"));
if (xmlDir.Exists)
{
xmlFiles.AddRange(xmlDir.GetFiles("*.xml"));
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With