I wrote a program that compacts two small files into a single-bigger file. I first read data from input files, merge data, and write output to a temp file. Once this completes I rename the temp file to the desired file name (located in the same partition on disk). Here is pseudo code:
FILE* fp_1 = fopen("file_1.dat", "r+b");
FILE* fp_2 = fopen("file_2.dat", "r+b");
FILE* fp_out = fopen("file_tmp.dat", "w+b");
// 1. Read data for the key in two files
const char* data_1 = ...;
const char* data_2 = ...;
// 2. Merge data, store in an allocated buffer
// 3. Write merged buffer to temp file
fwrite(temp_buff, estimated_size, 1, fp_out);
fflush(fp_out);
fclose(fp_1);
fclose(fp_2);
fclose(fp_out);
// Now rename temp file to desired file name
if(std::rename("file_tmp.dat", "file_out.dat") == 0)
{
std::remove("file_1.dat");
std::remove("file_2.dat");
}
I repeatedly tested the program with two input files of 5 MBs each. One time I suddenly shutdown the system by unplugging the power cable. After restarting the system I checked the data and found that the input files were removed and the file_out.dat
was filled with all zeros. This made me believe that the system went down right after 2 input files were removed and the output data was still somewhere in the disk controller's buffer. If this is true, then is there any way that I can check if the data has been actually written to disk?
Not in the general case. Even if you tell the OS to wait until the data is written (with the sync
API family), some disks lie to the OS, claiming the write finished when it's really just queued in the hard drive's onboard RAM cache, which will be lost on abrupt power loss.
The best you can do is explicitly ask the OS to tell the disk to "really, really sync everything and block until it's done" after you've performed the fflush
(which only tells the stdio library to send all user-mode buffered data to the OS, which often keeps it in kernel buffers and syncs the kernel buffers to disk later, in the background), either limited scope with fsync
or using something like sync
or syncfs
(the former syncs all file systems, the latter limits the scope to the file system corresponding to a single file descriptor).
For maximum safety, you'd want to:
fsync
after the final fflush
but before the rename
(so the new file is complete on disk before replacing the old one), andsync
/syncfs
after the rename
but before the remove
calls (so the metadata updates from the rename
are complete before you delete the source files)Omitting step 1 is okay if you don't mind corrupted output data in cases where the input data still exists.
Of course, like I said, this is all best effort; if the disk controller is lying to the OS, there is nothing you can do shy of writing new firmware and drivers for the disk, which is probably going too far.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With