I have a base directory that contains several thousand folders. Inside of these folders there can be between 1 and 20 subfolders that contains between 1 and 10 files. I'd like to delete all files that are over 60 days old. I was using the code below to get the list of files that I would have to delete:
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
FileInfo[] oldFiles =
dirInfo.GetFiles("*.*", SearchOption.AllDirectories)
.Where(t=>t.CreationTime < DateTime.Now.AddDays(-60)).ToArray();
But I let this run for about 30 minutes and it still hasn't finished. I'm curious if anyone can see anyway that I could potentially improve the performance of the above line or if there is a different way I should be approaching this entirely for better performance? Suggestions?
To enumerate directories and files, use methods that return an enumerable collection of directory or file names, or their DirectoryInfo, FileInfo, or FileSystemInfo objects. If you want to search and return only the names of directories or files, use the enumeration methods of the Directory class.
EnumerateFiles(String, String, EnumerationOptions) Returns an enumerable collection of full file names that match a search pattern and enumeration options in a specified path, and optionally searches subdirectories.
File/parameter enumeration is a common technique used to search for suspicious files and parameter values in order to detect their existence or validity. Using this technique, it is possible to map additional parts of the application, which are not normally exposed to the public.
This is (probably) as good as it's going to get:
DateTime sixtyLess = DateTime.Now.AddDays(-60);
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
FileInfo[] oldFiles =
dirInfo.EnumerateFiles("*.*", SearchOption.AllDirectories)
.AsParallel()
.Where(fi => fi.CreationTime < sixtyLess).ToArray();
Changes:
DateTime
constant, and therefore less CPU load.EnumerateFiles
.Should run in a smaller amount of time (not sure how much smaller).
Here is another solution which might be faster or slower than the first, it depends on the data:
DateTime sixtyLess = DateTime.Now.AddDays(-60);
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
FileInfo[] oldFiles =
dirInfo.EnumerateDirectories()
.AsParallel()
.SelectMany(di => di.EnumerateFiles("*.*", SearchOption.AllDirectories)
.Where(fi => fi.CreationTime < sixtyLess))
.ToArray();
Here it moves the parallelism to the main folder enumeration. Most of the changes from above apply too.
A possibly faster alternative is to use WINAPI FindNextFile
. There is an excellent Faster Directory Enumeration Tool for this. Which can be used as follows:
HashSet<FileData> GetPast60(string dir)
{
DateTime retval = DateTime.Now.AddDays(-60);
HashSet<FileData> oldFiles = new HashSet<FileData>();
FileData [] files = FastDirectoryEnumerator.GetFiles(dir);
for (int i=0; i<files.Length; i++)
{
if (files[i].LastWriteTime < retval)
{
oldFiles.Add(files[i]);
}
}
return oldFiles;
}
So, based on comments below, I decided to do a benchmark of suggested solutions here as well as others I could think of. It was interesting enough to see that EnumerateFiles seemed to out-perform FindNextFile in C#, while EnumerateFiles
with AsParallel
was by far the fastest followed surprisingly by command prompt count. However do note that AsParallel
wasn't getting the complete file count or was missing some files counted by the others so you could say the command prompt method is the best.
Applicable Config:
Below are three screenshots:
I have included my test code below:
static void Main(string[] args)
{
Console.Title = "File Enumeration Performance Comparison";
Stopwatch watch = new Stopwatch();
watch.Start();
var allfiles = GetPast60("C:\\Users\\UserName\\Documents");
watch.Stop();
Console.WriteLine("Total time to enumerate using WINAPI =" + watch.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles);
Stopwatch watch1 = new Stopwatch();
watch1.Start();
var allfiles1 = GetPast60Enum("C:\\Users\\UserName\\Documents\\");
watch1.Stop();
Console.WriteLine("Total time to enumerate using EnumerateFiles =" + watch1.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles1);
Stopwatch watch2 = new Stopwatch();
watch2.Start();
var allfiles2 = Get1("C:\\Users\\UserName\\Documents\\");
watch2.Stop();
Console.WriteLine("Total time to enumerate using Get1 =" + watch2.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles2);
Stopwatch watch3 = new Stopwatch();
watch3.Start();
var allfiles3 = Get2("C:\\Users\\UserName\\Documents\\");
watch3.Stop();
Console.WriteLine("Total time to enumerate using Get2 =" + watch3.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles3);
Stopwatch watch4 = new Stopwatch();
watch4.Start();
var allfiles4 = RunCommand(@"dir /a: /b /s C:\Users\UserName\Documents");
watch4.Stop();
Console.WriteLine("Total time to enumerate using Command Prompt =" + watch4.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles4);
Console.WriteLine("Press Any Key to Continue...");
Console.ReadLine();
}
private static int RunCommand(string command)
{
var process = new Process()
{
StartInfo = new ProcessStartInfo("cmd")
{
UseShellExecute = false,
RedirectStandardInput = true,
RedirectStandardOutput = true,
CreateNoWindow = true,
Arguments = String.Format("/c \"{0}\"", command),
}
};
int count = 0;
process.OutputDataReceived += delegate { count++; };
process.Start();
process.BeginOutputReadLine();
process.WaitForExit();
return count;
}
static int GetPast60Enum(string dir)
{
return new DirectoryInfo(dir).EnumerateFiles("*.*", SearchOption.AllDirectories).Count();
}
private static int Get2(string myBaseDirectory)
{
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
return dirInfo.EnumerateFiles("*.*", SearchOption.AllDirectories)
.AsParallel().Count();
}
private static int Get1(string myBaseDirectory)
{
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
return dirInfo.EnumerateDirectories()
.AsParallel()
.SelectMany(di => di.EnumerateFiles("*.*", SearchOption.AllDirectories))
.Count() + dirInfo.EnumerateFiles("*.*", SearchOption.TopDirectoryOnly).Count();
}
private static int GetPast60(string dir)
{
return FastDirectoryEnumerator.GetFiles(dir, "*.*", SearchOption.AllDirectories).Length;
}
NB: I concentrated on count in the benchmark not modified date.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With