Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is faster, writing raw data to a drive, or writing to a file?

Tags:

c++

c

windows

I need to write data into drive. I have two options:

  1. write raw sectors.(_write(handle, pBuffer, size);)
  2. write into a file (fwrite(pBuffer, size, count, pFile);)

Which way is faster?

I expected the raw sector writing function, _write, to be more efficient. However, my test result failed! fwrite is faster. _write costs longer time.

I've pasted my snippet; maybe my code is wrong. Can you help me out? Either way is okay by me, but I think raw write is better, because it seems the data in the drive is encrypted at least....

#define SSD_SECTOR_SIZE 512
int g_pSddDevHandle = _open("\\\\.\\G:",_O_RDWR | _O_BINARY, _S_IREAD | _S_IWRITE);
TIMER_START();
while (ulMovePointer < 1024 * 1024 * 1024)
{
    _write(g_pSddDevHandle,szMemZero,SSD_SECTOR_SIZE);
    ulMovePointer += SSD_SECTOR_SIZE;
}
TIMER_END();
TIMER_PRINT();
FILE * file = fopen("f:\\test.tmp","a+");
TIMER_START();
while (ulMovePointer < 1024 * 1024 * 1024)
{
    fwrite(szMemZero,SSD_SECTOR_SIZE,1,file);
    ulMovePointer += SSD_SECTOR_SIZE;
}
TIMER_END();
TIMER_PRINT();
like image 896
Shaobo Wang Avatar asked Nov 04 '09 06:11

Shaobo Wang


2 Answers

Probably because a direct write isn't buffered. When you call fwrite, you are doing buffered writes which tend to be faster in most situations. Essentially, each FILE* handler has an internal buffer which is flushed to disk periodically when it becomes full, which means you end up making less system calls, as you only write to disk in larger chunks.

To put it another way, in your first loop, you are actually writing SSD_SECTOR_SIZE bytes to disk during each iteration. In your second loop you are not. You are only writing SSD_SECTOR_SIZE bytes to a memory buffer, which, depending on the size of the buffer, will only be flushed every Nth iteration.

like image 70
Charles Salvia Avatar answered Oct 22 '22 09:10

Charles Salvia


In the _write() case, the value of SSD_SECTOR_SIZE matters. In the fwrite case, the size of each write will actually be BUFSIZ. To get a better comparison, make sure the underlying buffer sizes are the same.

However, this is probably only part of the difference.

In the fwrite case, you are measuring how fast you can get data into memory. You haven't flushed the stdio buffer to the operating system, and you haven't asked the operating system to flush its buffers to physical storage. To compare more accurately, you should call fflush() before stopping the timers.

If you actually care about getting data onto the disk rather than just getting the data into the operating systems buffers, you should ensure that you call fsync()/FlushFileBuffers() before stopping the timer.

Other obvious differences:

  • The drives are different. I don't know how different.

  • The semantics of a write to a device are different to the semantics of writes to a filesystem; the file system is allowed to delay writes to improve performance until explicitly told not to (eg. with a standard handle, a call to FlushFileBuffers()); writes directly to a device aren't necessarily optimised in that way. On the other hand, the file system must do extra I/O to manage metadata (block allocation, directory entries, etc.)

I suspect that you're seeing a different in policy about how fast things actually get on to the disk. Raw disk performance can be very fast, but you need big writes and preferably multiple concurrent outstanding operations. You can also avoid buffer copying by using the right options when you open the handle.

like image 39
janm Avatar answered Oct 22 '22 07:10

janm