I have a piece of code that needs to be able to modify a few bytes towards the end of a file. The problem is that the files are huge. Up to 100+ Gb.
I need the operation to be as fast as possible but after hours of Googeling, it looks like .Net is rather limited here???
I have mostly been trying using System.IO.FileStream and know of no other methods. A "reverse" filestream would do but I have know idea how to create one (write from the end instead of the beginning).
Here is sort of what I do: (Note: the time is spent when closing the stream)
static void Main(string[] args)
{
//Simulate a large file
int size = 1000 * 1024 * 1024;
string filename = "blah.dat";
FileStream fs = new FileStream(filename, FileMode.Create);
fs.SetLength(size);
fs.Close();
//Modify the last byte
fs = new FileStream(filename, FileMode.Open);
//If I don't seek, the modification happens instantly
fs.Seek(-1, SeekOrigin.End);
fs.WriteByte(255);
//Now, since I am modifying the last byte,
//this last step is very slow
fs.Close();
}
}
Like Darin already noted, this is an artifact of your 'simulation' of a large file.
The delay is from actually 'filling up' the file, the delay only happens the first time. If you repeat the part from //Modify the last byte
to fs.Close();
it will be very fast.
I've performed a few tests and results are a bit confusing. If you create the file and modify it in the same program it is slow:
static void Main(string[] args)
{
//Simulate a large file
int size = 100 * 1024 * 1024;
string filename = "blah.datn";
using (var fs = new FileStream(filename, FileMode.Create))
{
fs.SetLength(size);
}
using (var fs = new FileStream(filename, FileMode.Open))
{
fs.Seek(-1, SeekOrigin.End);
fs.WriteByte(255);
}
}
But if the file exists and you only try to modify the last byte it is fast:
static void Main(string[] args)
{
string filename = "blah.datn";
using (var fs = new FileStream(filename, FileMode.Open))
{
fs.Seek(-1, SeekOrigin.End);
fs.WriteByte(255);
}
}
Hmmm...
UPDATE:
Please ignore my previous observations and unmark this as an answer because it is all wrong.
Further investigating the issue I've noticed the following pattern. Suppose that you allocate a file of given size with zero bytes like this:
using (var stream = File.OpenWrite("blah.dat"))
{
stream.SetLength(100 * 1024 * 1024);
}
This operation is very fast and it creates a 100MB file filled with zeros.
Now if in some other program you try to modify the last byte, closing the stream will be slow:
using (var stream = File.OpenWrite("blah.dat"))
{
stream.Seek(-1, SeekOrigin.End);
stream.WriteByte(255);
}
I have no idea of the internal workings of the file system or how exactly is this file created but I have the feeling that it is not completely initialized until you try to modify it and closing the handle will be slow.
To confirm this I tested in unmanaged code (feel free to fix any aberration as my C is very rusty):
void main()
{
int size = 100 * 1024 * 1024 - 1;
FILE *handle = fopen("blah.dat", "wb");
if (handle != NULL) {
fseek(handle, size, SEEK_SET);
char buffer[] = {0};
fwrite(buffer, 1, 1, handle);
fclose(handle);
}
}
This behaves the same way as in .NET => it allocates a file of 100MB filled with zeros and it is very fast.
Now when I try to modify the last byte of this file:
void main()
{
int size = 100 * 1024 * 1024 - 1;
FILE *handle = fopen("blah.datn", "rb+");
if (handle != NULL) {
fseek(handle, -1, SEEK_END);
char buffer[] = {255};
fwrite(buffer, 1, 1, handle);
fclose(handle);
}
}
The last fclose(handle)
is slow. I hope some experts will bring some light here.
It seems though that modifying the last byte of a real file (not sparse) using the previous methods is very fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With