Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why writing many small byte arrays to a file is faster than writing one big array?

I made a test to see if there is a difference between the time it takes to write a 1GB file on disk from a single byte array and writing another 1GB file from 1024 arrays (1MB each).

Test Writing many arrays
331.6902 ms
Test Writing big array
14756.7559 ms

For this test, the "many arrays" is actually a single byte[1024 * 1024] array that I write 1024 times using a for loop. The "big array" is just a 1GB byte array filled with random values.

Here's what the code looks like :

Console.WriteLine("Test Writing many arrays");

byte[] data = new byte[1048576];

for (int i = 0; i < 1048576; i++)
    data[i] = (byte)(i % 255);

FileStream file = new FileStream("test.txt", FileMode.Create);

sw1.Restart();

for (int i = 0; i < 1024; i++ )
     file.Write(data, 0, 1048576);

file.Close();
sw1.Stop();
s1 = sw1.Elapsed;
Console.WriteLine(s1.TotalMilliseconds);

Console.WriteLine("Test Writing big array");


 byte[] data2 = new byte[1073741824];

 for (int i = 0; i < 1073741824; i++)
      data2[i] = (byte)(i % 255);

 FileStream file2 = new FileStream("test2.txt", FileMode.Create);

 sw1.Restart();

 file2.Write(data2, 0, 1073741824);

 file2.Close();
 sw1.Stop();

 s1 = sw1.Elapsed;
 Console.WriteLine(s1.TotalMilliseconds);

I included the file.Close() inside the timed part, since it calls the Flush() method and writes the stream to the disk.

The resulting files are the exact same size.

I tought maybe C# can see that I always use the same array and it might optimize the iteration/writing process, but the result is not 2-3 times faster, it's about 45 times faster... Why?

like image 702
Alex Rose Avatar asked Jul 11 '12 14:07

Alex Rose


2 Answers

I think the major reason for the big difference is that the OS manages to cache almost the entire 1GB write that you do in small chunks.

You need to change the way your benchmark is set up: the code should write the same data, first time in 1024 chunks, and the second time in one chunk. You also need to turn off the caching of data in the OS by specifying FileOptions.WriteThrough, like this:

var sw1 = new Stopwatch();
Console.WriteLine("Test Writing many arrays");
var data = new byte[1073741824];
for (var i = 0; i < 1073741824; i++)
    data[i] = (byte)(i % 255);
var file = new FileStream("c:\\temp\\__test1.txt", FileMode.Create, FileSystemRights.WriteData, FileShare.None, 8, FileOptions.WriteThrough);
sw1.Restart();
for (int i = 0; i < 1024; i++)
    file.Write(data, i*1024, 1048576);
file.Close();
sw1.Stop();
var s1 = sw1.Elapsed;
Console.WriteLine(s1.TotalMilliseconds);
Console.WriteLine("Test Writing big array");
var file2 = new FileStream("c:\\temp\\__test2.txt", FileMode.Create, FileSystemRights.WriteData, FileShare.None, 8, FileOptions.WriteThrough);
sw1.Restart();
file2.Write(data, 0, 1073741824);
file2.Close();
sw1.Stop();
s1 = sw1.Elapsed;
Console.WriteLine(s1.TotalMilliseconds);

When you run this code, the results look as follows:

Test Writing many arrays
5234.5885
Test Writing big array
5032.3626
like image 148
Sergey Kalinichenko Avatar answered Sep 20 '22 04:09

Sergey Kalinichenko


The reason is likely to be that the single 1MB array is being held in main memory, but the 1GB array was swapped out to disk.

Therefore when writing the single array 1024 times, you were writing from memory to disk. If the destination file is contiguous, the HDD head doesn't have to move far during this process.

Writing the 1GB array once, you were reading from disk to memory then writing to disk, in all likelihood resulting in at least two HDD head movements for each write - first to read the block from the swapfile, then back to the destination file to write it.

like image 30
Ben Avatar answered Sep 21 '22 04:09

Ben