Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explanation/information sought: Windows write I/O performance with "fsync" (FlushFileBuffers)

This question is in follow up of an earlier question I posted: Windows fsync (FlushFileBuffers) performance with large files. Where I found a possible solution but also new questions.

While benchmarking different scenarios for fsynced writes, I found a number of surprising results. I am hoping someone can help explain or point me in the direction of information that explains these results.

What this benchmark does is writing random blocks (pages 4096 bytes large) to a file sequentially in batches of 8 pages (32 K) and then flushing the writes. It writes a total of 200000 pages, amounting to a total of 800 MB and 25000 flushes. The file's size is set to its final length before beginning the writes.

It supports a total of 4 options, of which all combinations are run:

  • To perform an "fsync"/FlushFileBuffers operation (FS) after writing a batch or a normal flush (NS).
  • To write a single byte to the last position of the file before starting to write (LB) or leave the file empty (E).
  • To use normal buffered writes (B) or unbuffered/writethrough (WT) writes (using FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH).
  • To write directly to the file stream, i.e through the file handle (F) or write indirectly to the file using a memory map (MM).

The table below summarizes my findings on my system (64 bit Win 7 laptop with slow spindle disk) for all combinations of these options.

Benchmark results for all combinations of options

What I found is that the performance of "fsynced" buffered writes decreases exponentially with the size of the file to an incredibly low throughput that makes doing this not feasible in combination with large files. If the file had its last byte written (option LB), the throughput is even lower, so I fear that in random instead of sequential write scenarios the performance will be even far more dramatic.

Surprisingly however is that with unbuffered/writethrough I/O the throughput remains constant, independent of the size of the file. Initially (the first 100-200 MB) it is at a lower throughput than the buffered writes, but after that the average throughput catches up quickly and it finishes writing the 800 MB substantially quicker. Even more surprising is that if the file had its last byte written, the throughput increases by a factor of 2.

When writing to the file through a memory mapped file, the same exponential decrease in performance is seen, also in the case where the file was opened with unbuffered/writethrough flags. And here as well, performance is worse if the file had a byte written to its last position.

UPDATE Based upon Howard's explanations here and here, I reran the test without creating a new file before starting the writes (i.e. opening the existing, fully written file and overwriting it). I have updated the code in my original question to reflect the changes made for this test. The results are partly in line with his explanation and findings on Linux. But there are some notable exceptions. The table below provides the results, red highlights significant changes, blue highlights where changes did not occur and this is surprising (i.e. not in line with expectations if the effects mentioned in Howard's explanation were the only ones at play).

Results when overwriting existing file

For the buffered writes to file (i.e. not through memmap) with an "fsync" flush, the performance has now changed from exponentially decaying to a constant trend. However, it now takes much longer than in the previous test scenarios. The throughput is a constant 1.5 MB/s, where before it started at around 20 MB/s to exponentially decay to around 1.5 MB/s. It would appear that a possible explanation is that the file metadata also gets flushed on each flush, causing a full disk revolution to seek for the location of the metadata.

For the "write through" to file scenarios, the results for writing the last byte or not, are now identical, in line with what is expected from Howard's explanation.

The writes to the memory map however, with one notable exception, have not really changed, and this is surprising. They still show the same exponential decay in write performance (starting at around 20 MB/s decaying to 1.8 MB/s). This would suggest that a different mechanism is at play. The one notable exception is if the underlying file was created without FILE_FLAG_WRITE_THROUGH and "fsync" flushes are performed. This scenario now shows a constant (poor) performance with a throughput of around 1.6 MB/s. Since I had some doubts, I reran this scenario multiple times, giving the same result each time.

To figure out a bit more, I also reran this test using a smaller file (50000 pages, amounting to 200 MB), to confirm, that the fsync performance (for buffered I/O) actually does depend on file size. The results are shown below, with those that deserve special attention highlighted in red.

Results on smaller file

These results correlate well with what was seen for the larger file. The notable changes are that writes are a bit more performant for those that are highlighted, where they seem to hit a limit of around 7 MB/s.

Summarizing as highly speculative conclusions based on observations on my sytem so far:

  • "fsync" performance on windows on files with buffered IO (i.e. without FILE_FLAG_WRITE_THROUGH flags) is exponentially decreasing with the number of bytes already written to the file. The reason seems to be the need to flush file metadata every time, which causes a disk seek to the start of the file.
  • "fsync" performance on windows when writing to a memory mapped file also shows exponentially decreasing performance. I do not currently have an explanation for the exact mechanism(s) causing this.

Given this observed performance, at least for my use case, these two I/O options would not represent feasible solutions.

As per Greg's suggestion I will rerun the test with windows disk caching turned off, and I will also run Howard's provided benchmark code to exclude the possibility that results are skewed due to errors in my own.

UPDATE 2 I have completed the tests and am currently compiling the results. In order not to write "the full history of" I will be replacing the current contents of this question with a summary of the results, findings and some conclusions. The answers by Howard on this question, and the ability to run his c benchmark code next to the .NET code has been most useful. The results of those to applications correlate quite well. Rlb's answer has helped me to get a better feeling for what are "reasonable numbers" related to disks. Thanks.

A part of the question remains unanswered. Particularly related to observed decreasing (and file size dependent) performance when writing to a memory map. It may be related to seeks/metadata flushes, but it is not yet clear to me why/how.

like image 515
Alex Avatar asked Aug 19 '13 07:08

Alex


1 Answers

You're seeing an exponential decrease in speed on the sync runs because these aren't purely sequential workloads as you believe. Since you're starting with a new file each time, your writes are growing the file and the metadata needs to be updated in the filesystem. That requires multiple seeks, and as the file grows the seeks from the end of the file to the metadata take longer and longer. I also posted this on your other question by mistake, see the full answer there: https://stackoverflow.com/a/18429712/894520

like image 157
hyc Avatar answered Oct 24 '22 00:10

hyc