Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a faster way to scan through a directory recursively in .NET?

I am writing a directory scanner in .NET.

For each File/Dir I need the following info.

   class Info {         public bool IsDirectory;         public string Path;         public DateTime ModifiedDate;         public DateTime CreatedDate;     } 

I have this function:

      static List<Info> RecursiveMovieFolderScan(string path){          var info = new List<Info>();         var dirInfo = new DirectoryInfo(path);         foreach (var dir in dirInfo.GetDirectories()) {             info.Add(new Info() {                 IsDirectory = true,                 CreatedDate = dir.CreationTimeUtc,                 ModifiedDate = dir.LastWriteTimeUtc,                 Path = dir.FullName             });              info.AddRange(RecursiveMovieFolderScan(dir.FullName));         }          foreach (var file in dirInfo.GetFiles()) {             info.Add(new Info()             {                 IsDirectory = false,                 CreatedDate = file.CreationTimeUtc,                 ModifiedDate = file.LastWriteTimeUtc,                 Path = file.FullName             });         }          return info;      } 

Turns out this implementation is quite slow. Is there any way to speed this up? I'm thinking of hand coding this with FindFirstFileW but would like to avoid that if there is a built in way that is faster.

like image 904
Sam Saffron Avatar asked Apr 07 '09 04:04

Sam Saffron


2 Answers

This implementation, which needs a bit of tweaking is 5-10X faster.

    static List<Info> RecursiveScan2(string directory) {         IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1);         WIN32_FIND_DATAW findData;         IntPtr findHandle = INVALID_HANDLE_VALUE;          var info = new List<Info>();         try {             findHandle = FindFirstFileW(directory + @"\*", out findData);             if (findHandle != INVALID_HANDLE_VALUE) {                  do {                     if (findData.cFileName == "." || findData.cFileName == "..") continue;                      string fullpath = directory + (directory.EndsWith("\\") ? "" : "\\") + findData.cFileName;                      bool isDir = false;                      if ((findData.dwFileAttributes & FileAttributes.Directory) != 0) {                         isDir = true;                         info.AddRange(RecursiveScan2(fullpath));                     }                      info.Add(new Info()                     {                         CreatedDate = findData.ftCreationTime.ToDateTime(),                         ModifiedDate = findData.ftLastWriteTime.ToDateTime(),                         IsDirectory = isDir,                         Path = fullpath                     });                 }                 while (FindNextFile(findHandle, out findData));              }         } finally {             if (findHandle != INVALID_HANDLE_VALUE) FindClose(findHandle);         }         return info;     } 

extension method:

 public static class FILETIMEExtensions {         public static DateTime ToDateTime(this System.Runtime.InteropServices.ComTypes.FILETIME filetime ) {             long highBits = filetime.dwHighDateTime;             highBits = highBits << 32;             return DateTime.FromFileTimeUtc(highBits + (long)filetime.dwLowDateTime);         }     } 

interop defs are:

    [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]     public static extern IntPtr FindFirstFileW(string lpFileName, out WIN32_FIND_DATAW lpFindFileData);      [DllImport("kernel32.dll", CharSet = CharSet.Unicode)]     public static extern bool FindNextFile(IntPtr hFindFile, out WIN32_FIND_DATAW lpFindFileData);      [DllImport("kernel32.dll")]     public static extern bool FindClose(IntPtr hFindFile);      [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]     public struct WIN32_FIND_DATAW {         public FileAttributes dwFileAttributes;         internal System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;         internal System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;         internal System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;         public int nFileSizeHigh;         public int nFileSizeLow;         public int dwReserved0;         public int dwReserved1;         [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]         public string cFileName;         [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]         public string cAlternateFileName;     } 
like image 117
Sam Saffron Avatar answered Oct 05 '22 03:10

Sam Saffron


There is a long history of the .NET file enumeration methods being slow. The issue is there is not an instantaneous way of enumerating large directory structures. Even the accepted answer here has it's issues with GC allocations.

The best I've been able do is wrapped up in my library and exposed as the FileFile (source) class in the CSharpTest.Net.IO namespace. This class can enumerate files and folders without unneeded GC allocations and string marshaling.

The usage is simple enough, and the RaiseOnAccessDenied property will skip the directories and files the user does not have access to:

    private static long SizeOf(string directory)     {         var fcounter = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);         fcounter.RaiseOnAccessDenied = false;          long size = 0, total = 0;         fcounter.FileFound +=             (o, e) =>             {                 if (!e.IsDirectory)                 {                     Interlocked.Increment(ref total);                     size += e.Length;                 }             };          Stopwatch sw = Stopwatch.StartNew();         fcounter.Find();         Console.WriteLine("Enumerated {0:n0} files totaling {1:n0} bytes in {2:n3} seconds.",                           total, size, sw.Elapsed.TotalSeconds);         return size;     } 

For my local C:\ drive this outputs the following:

Enumerated 810,046 files totaling 307,707,792,662 bytes in 232.876 seconds.

Your mileage may vary by drive speed, but this is the fastest method I've found of enumerating files in managed code. The event parameter is a mutating class of type FindFile.FileFoundEventArgs so be sure you do not keep a reference to it as it's values will change for each event raised.

You might also note that the DateTime's exposed are only in UTC. The reason is that the conversion to local time is semi-expensive. You might consider using UTC times to improve performance rather than converting these to local time.

like image 33
csharptest.net Avatar answered Oct 05 '22 04:10

csharptest.net