I'm trying to calculate directory sizes in a way that divides the load so that the user can see counting progress. I thought a logical way to do this would be to first create the directory tree then do an operation counting the length of all the files.
The thing that comes to me as unexpected is that the bulk of time (disk I/O) comes from creating the directory tree, then going over the FileInfo[]
comes nearly instantly with virtually no disk I/O.
I've tried with both Directory.GetDirectories()
, simply creating a tree of strings of the directory names, and using a DirectoryInfo
object, and both methods still take the bulk of the I/O time (reading the MFT of course) compared to going over all the FileInfo.Length
for the files in each directory.
I guess there's no way to reduce the I/O to make the tree significantly, I guess I'm just wondering why this operation takes significantly more time compared to going over the more numerous files?
Also, if anyone could recommend a non-recursive way to tally things up (since it seems I need to just split up the enumeration and balance it in order to make the size tallying more responsive). Making a thread for each subdirectory off the base and letting scheduler competition balance things out would probably not be very good, would it?
EDIT: Repository for this code
You can utilize Parallel.ForEach to run the directory size calculation in parallel fashion. You can get the GetDirectories and run the Parallel.ForEach on each node. You can use a variable to keep track of size and display that to the user. Each parallel calculation would be incrementing on the same variable. If needed use lock() to synchronize between parallel executions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With