Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Secure and efficient way to modify multiple files on POSIX systems?

I have been following the discussion on the "bug" on EXT4 that causes files to be zeroed in crash if one uses the "create temp file, write temp file, rename temp to target file" process. POSIX says that unless fsync() is called, you cannot be sure the data has been flushed to harddisk.

Obviously doing:

0) get the file contents (read it or make it somehow)
1) open original file and truncate it
2) write new contents
3) close file

is not good even with fsync() as the computer can crash during 2) or fsync() and you end up with partially written file.

Usually it has been thought that this is pretty safe:

0) get the file contents (read it or make it somehow)
1) open temp file
2) write contents to temp file
3) close temp file
4) rename temp file to original file

Unfortunately it isn't. To make it safe on EXT4 you would need to do:

0) get the file contents (read it or make it somehow)
1) open temp file
2) write contents to temp file
3) fsync()
4) close temp file
5) rename temp file to original file

This would be safe and on crash you should either have the new file contents or old, never zeroed contents or partial contents. But if the application uses lots of files, fsync() after every write would be slow.

So my question is, how to modify multiple files efficiently on a system where fsync() is required to be sure that changes have been saved to disk? And I really mean modifying many files, as in thousands of files. Modifying two files and doing fsync() after each wouldn't be too bad, but fsync() does slow things down when modifying multiple files.

EDIT: changed the fsync() close temp file to corrent order, added emphasis on writing many many many files.

like image 828
Raynet Avatar asked Mar 20 '09 12:03

Raynet


2 Answers

The short answer is: Solving this in the app layer is the wrong place. EXT4 must make sure that after I close the file, the data is written in a timely manner. As it is now, EXT4 "optimizes" this writing to be able to collect more write requests and burst them out in one go.

The problem is obvious: No matter what you do, you can't be sure that your data ends on the disk. Calling fdisk() manually only makes things worse: You basically get in the way of EXT4's optimization, slowing the whole system down.

OTOH, EXT4 has all the information necessary to make an educated guess when it is necessary to write data out to the disk. In this case, I rename the temp file to the name of an existing file. For EXT4, this means that it must either postpone the rename (so the data of the original file stays intact after a crash) or it must flush at once. Since it can't postpone the rename (the next process might want to see the new data), renaming implicitly means to flush and that flush must happen on the FS layer, not the app layer.

EXT4 might create a virtual copy of the filesystem which contains the changes while the disk is not modified (yet). But this doesn't affect the ultimate goal: An app can't know what optimizations the FS if going to make and therefore, the FS must make sure that it does its job.

This is a case where ruthless optimizations have gone too far and ruined the results. Golden rule: Optimization must never change the end result. If you can't maintain this, you must not optimize.

As long as Tso believes that it is more important to have a fast FS rather than one which behaves correctly, I suggest not to upgrade to EXT4 and close all bug reports about this is "works as designed by Tso".

[EDIT] Some more thoughts on this. You could use a database instead of the file. Let's ignore the resource waste for a moment. Can anyone guarantee that the files, which the database uses, won't become corrupted by a crash? Probably. The database can write the data and call fsync() every minute or so. But then, you could do the same:

while True; do sync ; sleep 60 ; done

Again, the bug in the FS prevents this from working in every case. Otherwise, people wouldn't be so bothered by this bug.

You could use a background config daemon like the Windows registry. The daemon would write all configs in one big file. It could call fsync() after writing everything out. Problem solved ... for your configs. Now you need to do the same for everything else your apps write: Text documents, images, whatever. I mean almost any Unix process creates a file. This is the freaking basis of the whole Unix idea!

Clearly, this is not a viable path. So the answer remains: There is no solution on your side. Keep bothering Tso and the other FS developers until they fix their bugs.

like image 172
Aaron Digulla Avatar answered Nov 07 '22 21:11

Aaron Digulla


My own answer would be to keep to the modifications on temp files, and after finishing writing them all, do one fsync() and then do rename on them all.

like image 28
Raynet Avatar answered Nov 07 '22 22:11

Raynet