Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File.Copy() performance if a file already exists

I have a project that runs as a scheduled task every 5 minutes. Among other things, the project runs through hundreds of images and copies them to a network drive in this manner:

foreach (string file in Files)
{
    string Control = Path.GetFileNameWithoutExtension(file);
        File.SetAttributes(file, FileAttributes.Normal);
        try
        {
            File.Copy(file, destinationFolder + "\\" + Control + @".pdf", false);
        }
        catch (Exception err)
        {
            Console.Writeline(err.ToString());
        }
}

The "false" argument, of course, is telling it NOT to overwrite a file if it already exists.

Is this faster/better practice than first checking if the file already exists and then only copying if the file does not exist? (see below)

foreach (string file in Files)
{
    if (File.Exists(destinationFolder + "\\" + ControlNumber + ".pdf") == false)
    {
        File.SetAttributes(file, FileAttributes.Normal);
        File.Copy(file, destinationFolder + "\\" + ControlNumber + @".pdf");
    }
}

My gut tells me that the first is the better way. However, I'm relatively new to programming and would love to know which is better, faster, more widely accepted, etc..

It may or may not be helpful to know that the remote drive/folder I am copying to contains 4TB of image data (millions of images).

like image 778
Milne Avatar asked Oct 24 '13 22:10

Milne


2 Answers

Tested this on a local drive with the following results:

1000 times checking if file exists, then doing a File.Copy if it does not: 28.29 milliseconds

1000 times doing a File.Copy with overwrite set to false in a try, catch: 317.13 milliseconds

Tested on a network drive with the following results:

1000 times checking if file exists, then doing a File.Copy if it does not: 203.48 milliseconds

1000 times doing a File.Copy with overwrite set to false in a try, catch: 14758.74 milliseconds

Based on that, I would think it's clear that doing a file check first would be more efficient.

like image 151
Ben Walker Avatar answered Sep 20 '22 05:09

Ben Walker


You are much more likely to see better performance using the first case (though make sure you wrap the call to File.Copy in a try..catch, since it will throw an IOException if the file does exist. Your first example lets the underlying platform handle the checks for file existence, which it may optimize in ways that your code cannot. Due to the round-trip time across the network for each call you make, drastically reducing the number of calls will have a performance gain.

In addition, the remote system might change between your call to File.Exists and File.Copy, and the latter will potentially overwrite files that are created between when you check and when you start the copy.

A much better approach would be to create a list of files on the remote machine first and then only copy the files that don't already exist. When you do this copy, use your first method with the try..catch. This ensures that you don't waste time trying to copy files that were there when you started and also ensure that you don't accidentally overwrite a file that is created after you start copying things across.

like image 24
seawolf Avatar answered Sep 22 '22 05:09

seawolf