Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

System.IO.FileStream is super slow on huge files

Tags:

c#

I have a piece of code that needs to be able to modify a few bytes towards the end of a file. The problem is that the files are huge. Up to 100+ Gb.

I need the operation to be as fast as possible but after hours of Googeling, it looks like .Net is rather limited here???

I have mostly been trying using System.IO.FileStream and know of no other methods. A "reverse" filestream would do but I have know idea how to create one (write from the end instead of the beginning).

Here is sort of what I do: (Note: the time is spent when closing the stream)

    static void Main(string[] args)
    {    
        //Simulate a large file
        int size = 1000 * 1024 * 1024;
        string filename = "blah.dat";
        FileStream fs = new FileStream(filename, FileMode.Create);
        fs.SetLength(size);
        fs.Close();

        //Modify the last byte
        fs = new FileStream(filename, FileMode.Open);

        //If I don't seek, the modification happens instantly
        fs.Seek(-1, SeekOrigin.End);
        fs.WriteByte(255);

        //Now, since I am modifying the last byte, 
        //this last step is very slow
        fs.Close();
    }
}
like image 336
BlueVoodoo Avatar asked Jul 11 '10 07:07

BlueVoodoo


2 Answers

Like Darin already noted, this is an artifact of your 'simulation' of a large file.

The delay is from actually 'filling up' the file, the delay only happens the first time. If you repeat the part from //Modify the last byte to fs.Close(); it will be very fast.

like image 63
Henk Holterman Avatar answered Nov 15 '22 21:11

Henk Holterman


I've performed a few tests and results are a bit confusing. If you create the file and modify it in the same program it is slow:

static void Main(string[] args)
{
    //Simulate a large file
    int size = 100 * 1024 * 1024;
    string filename = "blah.datn";
    using (var fs = new FileStream(filename, FileMode.Create))
    {
        fs.SetLength(size);
    }

    using (var fs = new FileStream(filename, FileMode.Open))
    {
        fs.Seek(-1, SeekOrigin.End);
        fs.WriteByte(255);
    }
}

But if the file exists and you only try to modify the last byte it is fast:

static void Main(string[] args)
{
    string filename = "blah.datn";
    using (var fs = new FileStream(filename, FileMode.Open))
    {
        fs.Seek(-1, SeekOrigin.End);
        fs.WriteByte(255);
    }
}

Hmmm...


UPDATE:

Please ignore my previous observations and unmark this as an answer because it is all wrong.

Further investigating the issue I've noticed the following pattern. Suppose that you allocate a file of given size with zero bytes like this:

using (var stream = File.OpenWrite("blah.dat"))
{
    stream.SetLength(100 * 1024 * 1024);
}

This operation is very fast and it creates a 100MB file filled with zeros.

Now if in some other program you try to modify the last byte, closing the stream will be slow:

using (var stream = File.OpenWrite("blah.dat"))
{
    stream.Seek(-1, SeekOrigin.End);
    stream.WriteByte(255);
}

I have no idea of the internal workings of the file system or how exactly is this file created but I have the feeling that it is not completely initialized until you try to modify it and closing the handle will be slow.

To confirm this I tested in unmanaged code (feel free to fix any aberration as my C is very rusty):

void main()
{
    int size = 100 * 1024 * 1024 - 1;
    FILE *handle = fopen("blah.dat", "wb");
    if (handle != NULL) {
        fseek(handle, size, SEEK_SET);
        char buffer[] = {0};
        fwrite(buffer, 1, 1, handle);
        fclose(handle);
    }
}

This behaves the same way as in .NET => it allocates a file of 100MB filled with zeros and it is very fast.

Now when I try to modify the last byte of this file:

void main()
{
    int size = 100 * 1024 * 1024 - 1;
    FILE *handle = fopen("blah.datn", "rb+");
    if (handle != NULL) {
        fseek(handle, -1, SEEK_END);
        char buffer[] = {255};
        fwrite(buffer, 1, 1, handle);
        fclose(handle);
    }
}

The last fclose(handle) is slow. I hope some experts will bring some light here.

It seems though that modifying the last byte of a real file (not sparse) using the previous methods is very fast.

like image 37
Darin Dimitrov Avatar answered Nov 15 '22 23:11

Darin Dimitrov