We have encountered an unexpected performance issue when traversing directories looking for files using a wildcard pattern.
We have 180 folders each containing 10,000 files. A command line search using dir <pattern> /s
completes almost instantly (<0.25 second). However, from our application the same search takes between 3-4 seconds.
We initially tried using System.IO.DirectoryInfo.GetFiles()
with SearchOption.AllDirectories
and have now tried the Win32 API calls FindFirstFile()
and FindNextFile()
.
Profiling our code using indicates that the vast majority of execution time is spent on these calls.
Our code is based on the following blog post:
http://codebetter.com/blogs/matthew.podwysocki/archive/2008/10/16/functional-net-fighting-friction-in-the-bcl-with-directory-getfiles.aspx
We found this to be slow so updated the GetFiles
function to take a string
search pattern rather than a predicate.
Can anyone shed any light on what might be wrong with our approach?
In my tests using FindFirstFileEx
with FindExInfoBasic
and FIND_FIRST_EX_LARGE_FETCH
is much faster than the plain FindFirstFile
.
Scanning 20 folders with ~300,000 files took 661 seconds with FindFirstFile
and 11 seconds with FindFirstFileEx
. Subsequent calls to the same folders took less than a second.
HANDLE h=FindFirstFileEx(search.c_str(), FindExInfoBasic, &data, FindExSearchNameMatch, NULL, FIND_FIRST_EX_LARGE_FETCH);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With