I have a console application that is going to take about 625 days to complete. Unless there is a way to make it faster.
First off I am working in a directory that has around 4,000,000 files in if not more. I'm working in a database that has a row for each file and then some.
Now working with the SQL is relatively fast, the bottleneck is when I use File.Move() each move takes 18 seconds to complete.
Is there a faster way than File.Move()?
This is the bottleneck:
File.Move(Path.Combine(location, fileName), Path.Combine(rootDir, fileYear, fileMonth, fileName));
All of the other code runs pretty fast. All I need to do is move one file to a new location and then update the database location field.
I can show other code if needed, but really the above is the only current bottleneck.
Generally, Moving files will be faster because when moving, it will just change the links, not the Actual Position on the physical device. While copying will actually read and write the information to other place and hence takes more time.
But all in all, you shouldn't see a difference in speed when copying files on the same drive or outside of it. When it comes to moving a file to a different drive—or using the Cut command—you're basically creating a copy of the file in a different location then deleting the original file.
To copy a file to another folder, simply drag the file (with a sustained left-mouse click) to the destination folder visible in the folder tree. To move a file, hold down the Shift key while dragging.
Move(String, String, Boolean) Moves a specified file to a new location, providing the options to specify a new file name and to overwrite the destination file if it already exists.
It turns out switching from File.Move to setting up a FileInfo and using .MoveTo increased the speed significantly.
It will run in about 35 days now as opposed to 625 days.
FileInfo fileinfo = new FileInfo(Path.Combine(location, fileName));
fileinfo.MoveTo(Path.Combine(rootDir, fileYear, fileMonth, fileName));
18 seconds isn't really unusual. NTFS does not perform well when you have a lot of files in a single directory. When you ask for a file, it has to do a linear search of its directory data structure. With 1,000 files, that doesn't take too long. With 10,000 files you notice it. With 4 million files . . . yeah, it takes a while.
You can probably do this even faster if you pre-load all of the directory entries into memory. Then rather than calling the FileInfo constructor for each file, you just look it up in your dictionary.
Something like:
var dirInfo = new DirectoryInfo(path);
// get list of all files
var files = dirInfo.GetFileSystemInfos();
var cache = new Dictionary<string, FileSystemInfo>();
foreach (var f in files)
{
    cache.Add(f.FullName, f);
}
Now when you get a name from the database, you can just look it up in the dictionary. That might very well be faster than trying to get it from the disk each time.
You can move files in parallel and also using Directory.EnumerateFiles gives you a lazy loaded list of files (of-course I have not tested it with 4,000,000 files):
var numberOfConcurrentMoves = 2;
var moves = new List<Task>();
var sourceDirectory = "source-directory";
var destinationDirectory = "destination-directory";
foreach (var filePath in Directory.EnumerateFiles(sourceDirectory))
{
    var move = new Task(() =>
    {
        File.Move(filePath, Path.Combine(destinationDirectory, Path.GetFileName(filePath)));
        //UPDATE DB
    }, TaskCreationOptions.PreferFairness);
    move.Start();
    moves.Add(move);
    if (moves.Count >= numberOfConcurrentMoves)
    {
        Task.WaitAll(moves.ToArray());
        moves.Clear();
    }
}
Task.WaitAll(moves.ToArray());
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With