Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating directory sizes

I'm trying to calculate directory sizes in a way that divides the load so that the user can see counting progress. I thought a logical way to do this would be to first create the directory tree then do an operation counting the length of all the files.

The thing that comes to me as unexpected is that the bulk of time (disk I/O) comes from creating the directory tree, then going over the FileInfo[] comes nearly instantly with virtually no disk I/O.

I've tried with both Directory.GetDirectories(), simply creating a tree of strings of the directory names, and using a DirectoryInfo object, and both methods still take the bulk of the I/O time (reading the MFT of course) compared to going over all the FileInfo.Length for the files in each directory.

I guess there's no way to reduce the I/O to make the tree significantly, I guess I'm just wondering why this operation takes significantly more time compared to going over the more numerous files?

Also, if anyone could recommend a non-recursive way to tally things up (since it seems I need to just split up the enumeration and balance it in order to make the size tallying more responsive). Making a thread for each subdirectory off the base and letting scheduler competition balance things out would probably not be very good, would it?

EDIT: Repository for this code

like image 626
j.i.h. Avatar asked Jun 26 '12 17:06

j.i.h.


1 Answers

You can utilize Parallel.ForEach to run the directory size calculation in parallel fashion. You can get the GetDirectories and run the Parallel.ForEach on each node. You can use a variable to keep track of size and display that to the user. Each parallel calculation would be incrementing on the same variable. If needed use lock() to synchronize between parallel executions.

like image 160
loopedcode Avatar answered Sep 28 '22 05:09

loopedcode