This earlier SO question talks about how to retrieve all files in a directory tree that match one of multiple extensions.
eg. Retrieve all files within C:\ and all subdirectories, matching *.log, *.txt, *.dat.
The accepted answer was this:
var files = Directory.GetFiles("C:\\path", "*.*", SearchOption.AllDirectories)
.Where(s => s.EndsWith(".mp3") || s.EndsWith(".jpg"));
This strikes me as being quite inefficient. If you were searching on a directory tree that contains thousands of files (it uses SearchOption.AllDirectories), every single file in the specified directory tree is loaded into memory, and only then are mismatches removed. (Reminds me of the "paging" offered by ASP.NET datagrids.)
Unfortunately the standard System.IO.DirectoryInfo.GetFiles method only accepts one filter at a time.
It could be just my lack of Linq knowledge, is it actually inefficient in the way I mention?
Secondly, is there a more efficient way to do it both with and without Linq (without resorting to multiple calls to GetFiles)?
I shared your problem and I found the solution in Matthew Podwysocki's excellent post at codebetter.com.
He implemented a solution using native methods that allows you to provide a predicate into his GetFiles implementation. Additionally he implemented his solution using yield statements effectively reducing the memory utilization per file to an absolute minimum.
With his code you can write something like the following:
var allowedExtensions = new HashSet<string> { ".jpg", ".mp3" };
var files = GetFiles(
"C:\\path",
SearchOption.AllDirectories,
fn => allowedExtensions.Contains(Path.GetExtension(fn))
);
And the files variable will point to an enumerator that returns the files matched (delayed execution style).
You are right about the memory consumption. However, I think that's a fairly premature optimization. Loading an array of a few thousand strings is no problem at all, neither for performance nor for memory consumption. Reading a directoy containing that many files, however, is – no matter how you store/filter the file names: it will always be relatively slow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With