Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking if File.Exists() improves write speed

Tags:

c#

My application writes some files to disc but I've realised I'm over writing existing files during this process. So, I need to check if the file exists first and then perform some logic.

There could be many files and as such, I wanted to gauge how much over head (in terms of time) the impact would be. So, I created a console application to test it.

My code

using System;
using System.Collections.Generic;
using System.IO;

namespace TimeForFileRead
{
    class Program
    {
        static string myPath = "C:\\Users\\DRook\\Desktop\\temp\\";
        static string myPathFile = myPath + "file";
        static void Main(string[] args)
        {
            for (int i = 0; i < 5; i++)
            {
                DoSomeWork();
                Console.WriteLine(" =  =  =  =  =  =============== =  =  =  =  =");
            }
            Console.ReadKey();
        }

        static void DoSomeWork()
        {
            if (!Directory.Exists(myPath))
                Directory.CreateDirectory(myPath);    

            System.Diagnostics.Stopwatch stopWatch = new System.Diagnostics.Stopwatch();

            stopWatch.Start();

            for (int i = 0; i < 1000; i++)
            {
                using (StreamWriter sw = new StreamWriter(myPathFile + i.ToString() + ".txt"))
                {
                    sw.Write(i.ToString());
                }
                i++;
            }

            stopWatch.Stop();

            Console.WriteLine("Write only: " + stopWatch.Elapsed);

            Directory.Delete(myPath, true);
            System.Threading.Thread.Sleep(500);
            Directory.CreateDirectory(myPath);
            System.Threading.Thread.Sleep(500);

            stopWatch.Reset();

            stopWatch.Start();

            for (int i = 0; i < 1000; i++)
            {
                if (!File.Exists(myPathFile + i.ToString() + ".txt"))
                {
                    using (StreamWriter sw = new StreamWriter(myPathFile + i.ToString() + ".txt"))
                    {
                        sw.Write(i.ToString());
                    }
                }
                i++;
            }
            stopWatch.Stop();
            Console.WriteLine("Write and File check: " + stopWatch.Elapsed);
        }
    }
}

So, as you can see, it performs 2 actions. I is writing files to disk, the other is to check if the file already exists and if it doesn't, then write to disc.

A screen shot of my console window (the results):

enter image description here

As you can see, the strange thing is that it is nearly always quicker to first check if the file exists and then write it than it is to write direct to disc. This has left me confused. Surely this makes no sense. Why does this extra over head improve the speed (considering that the File.Exists() will always return false in my code so therefore the Write is not skipped)? I assume a fault in my code but I've looked at this for a while and I can't make sense of it.

Edit

As per the comments, I changed the order around a little, so I now perform the one with File.Exists() check first and then the write only. The results are more exaggerated (although I am now iterating over 10000 instead of 1000 as per the code above):

enter image description here

Edit 2

@MatthewWatson noted a fault with my code, I've updated it to ensure the directory is always deleted first. Same issue persists but at a greatly reduced occurrence yet a more dramatic difference in speed.

using System;
using System.Collections.Generic;
using System.IO;

namespace TimeForFileRead
{
    class Program
    {
        static string myPath = "C:\\Users\\DRook\\Desktop\\temp\\";
        static string myPathFile = myPath + "file";
        static void Main(string[] args)
        {
            for (int i = 0; i < 5; i++)
            {
                DoSomeWork();
                Console.WriteLine(" =  =  =  =  =  =============== =  =  =  =  =");
            }
            Console.ReadKey();
        }

        static void DoSomeWork()
        {
            if (Directory.Exists(myPath))
                Directory.Delete(myPath, true);

            Directory.CreateDirectory(myPath);

            System.Diagnostics.Stopwatch stopWatch = new System.Diagnostics.Stopwatch();

            stopWatch.Start();

            for (int i = 0; i < 10000; i++)
            {
                using (StreamWriter sw = new StreamWriter(myPathFile + i.ToString() + ".txt"))
                {
                    sw.Write(i.ToString());

                }
                i++;
            }

            stopWatch.Stop();

            Console.WriteLine("Write  took : " + stopWatch.Elapsed);

            Directory.Delete(myPath, true);
            System.Threading.Thread.Sleep(500);
            Directory.CreateDirectory(myPath);
            System.Threading.Thread.Sleep(500);

            stopWatch.Reset();

            stopWatch.Start();

            for (int i = 0; i < 10000; i++)
            {
                if (!File.Exists(myPathFile + i.ToString() + ".txt"))
                {
                    using (StreamWriter sw = new StreamWriter(myPathFile + i.ToString() + ".txt"))
                    {
                        sw.Write(i.ToString());
                    }
                }
                i++;
            }

            stopWatch.Stop();

            Console.WriteLine("Write and check took: " + stopWatch.Elapsed);
        }
    }
}

enter image description here

like image 608
Dave Avatar asked Oct 29 '13 09:10

Dave


2 Answers

Too much code to put in a comment - the short answer is that Exists + Write should generally take longer than just write (even for existing files).

Disk IO is not very predictable (caching, warm-up, machine load, IO queues, HDD/SSD model, etc) but running tests with a large number of iterations (more than 1000) that take more than a few ms should give you and idea. On my machine, Exists+Write generally take longer but there are exceptions too - it could be a page swap interfering or one of the VMs, who knows....

Here's a slightly modified test suite with 4 scenarios: 1. new folder, write only 2. new folder, exists + write 3. existing folder and files (from step 2) write only 4. existing folder and files (from step 2) exists + write

Code below:

class FTest
{
    static string myPath = "C:\\Users\\DRook\\Desktop\\temp\\";
    static string myPathFile = myPath + "file";

    public static void test()
    {
        for (int i = 0; i < 5; i++)
        {
            DoSomeWork();
            Console.WriteLine(" =  =  =  =  =  =============== =  =  =  =  =");
        }
        Console.ReadKey();
    }

    public static void testX1(string path, int index)
    {
        using (StreamWriter sw = new StreamWriter(path + index.ToString() + ".txt"))
        {
            sw.Write(index.ToString());
        }
    }

    public static void testX2(string path, int index)
    {
        if (!File.Exists(path + index.ToString() + ".txt"))
        {
            using (StreamWriter sw = new StreamWriter(path + index.ToString() + ".txt"))
            {
                sw.Write(index.ToString());
            }
        }
        else
        {
            using (StreamWriter sw = new StreamWriter(path +"n"+ index.ToString() + ".txt"))
            {
                sw.Write(index.ToString());
            }
        }
    }

    static void runTestMeasure(Action<string, int> func, int count, string message, bool cleanup)
    {
        if (cleanup)
        {
            if (Directory.Exists(myPath)) Directory.Delete(myPath, true);
            System.Threading.Thread.Sleep(500);
            Directory.CreateDirectory(myPath);
            System.Threading.Thread.Sleep(500);
        }

        System.Diagnostics.Stopwatch stopWatch = new System.Diagnostics.Stopwatch();

        stopWatch.Start();

        for (int i = 0; i < count; i++)
        {
            func(myPath,i);
        }

        stopWatch.Stop();

        Console.WriteLine(message+": " + stopWatch.Elapsed);
    }

    static void DoSomeWork()
    {
        int count = 10000;
        runTestMeasure((path, ndx) => { testX1(path, ndx); },count,"Write missing file",true);
        System.Threading.Thread.Sleep(5000);
        runTestMeasure((path, ndx) => { testX2(path, ndx); }, count, "Write+Exists missing file",true);
        System.Threading.Thread.Sleep(5000);
        runTestMeasure((path, ndx) => { testX2(path, ndx); }, count, "Write existing file", false);
        System.Threading.Thread.Sleep(5000);
        runTestMeasure((path, ndx) => { testX2(path, ndx); }, count, "Write+Exists existing file", false);
    }
}

Check for yourself and see how it behaves on your machine. BTW: no point in having i++; inside for loops.

Edit: fixed textX2 code to create new file (alternate name) if file exists

like image 160
bkdc Avatar answered Nov 02 '22 23:11

bkdc


Your tests have no warmup and you are putting the Exists outside of your timings. I guess that when you use the same file it can be cached somewhere on os or hardware level. To make this test better:

  • Add warmup
  • Use a random/unique filenames for each run
  • Make your tests with 1000 and 10000 and 100000 files
  • Make sure your gc is in the same state at the beginning of each test
like image 43
Peter Avatar answered Nov 02 '22 23:11

Peter