Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Linux guarantee the contents of a file is flushed to disc after close()?

From "man 2 close":

A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes.

The man page says that if you want to be sure that your data are on disk, you have to use fsync() yourself.


No, close does not perform an fsync(2) and would batter many machines to death if it did so. Many intermediate files are opened and closed by their creator, then opened and closed by their consumer, then deleted, and this very common sequence would require touching the disk if close(2) performed an automatic fsync(2). Instead, the disk is usually not touched and the disk never knows the file was there.


It is also important to note that fsync does not guarantee a file is on disk; it just guarantees that the OS has asked the filesystem to flush changes to the disk. The filesystem does not have to write anything to disk

from man 3 fsync

If _POSIX_SYNCHRONIZED_IO is not defined, the wording relies heavily on the conformance document to tell the user what can be expected from the system. It is explicitly intended that a null implementation is permitted.

Luckily, all of the common filesystems for Linux do in fact write the changes to disk; unluckily that still doesn't guarantee the file is on the disk. Many hard drives come with write buffering turned on (and therefore have their own buffers that fsync does not flush). And some drives/raid controllers even lie to you about having flushed their buffers.


No. fclose() doesn't imply fsync(). A lot of Linux file systems delay writes and batch them up, which improves overall performance, presumably reduces wear on the disk drive, and improves battery life for laptops. If the OS had to write to disk whenever a file closed, many of these benefits would be lost.

Paul Tomblin mentioned a controversy in his answer, and explaining the one I've seen won't fit into a comment. Here's what I've heard:

The recent controversy is over the ext4 ordering (ext4 is the proposed successor to the popular ext3 Linux file system). It is customary, in Linux and Unix systems, to change important files by reading the old one, writing out the new one with a different name, and renaming the new one to the old one. The idea is to ensure that either the new one or the old one will be there, even if the system fails at some point. Unfortunately, ext4 appears to be happy to read the old one, rename the new one to the old one, and write the new one, which can be a real problem if the system goes down between steps 2 and 3.

The standard way to deal with this is of course fsync(), but that trashes performance. The real solution is to modify ext4 to keep the ext3 ordering, where it wouldn't rename a file until it had finished writing it out. Apparently this isn't covered by the standard, so it's a quality of implementation issue, and ext4's QoI is really lousy here, there being no way to reliably write a new version of configuration files without constantly calling fsync(), with all the problems that causes, or risking losing both versions.


No, it's not guaranteed. The OS has its own caching. All close really guarantees is that the programs buffers are flushed to the OS, but the OS may still be holding onto it unwritten. I believe there is some controversy in the Linux kernel world because even fsync doesn't guarantee that it's flushed to disk, at least in ext3.


The manpage for open says:

To guarantee synchronous I/O the O_SYNC must be used in addition to O_DIRECT.

and that

In general this (O_DIRECT) will degrade performance.

One could toggle this flag using fcntl with F_SETFL so as to minimize cache effects of the I/O for every read and write there after.


You may be also interested in this bug report from the firebird sql database regarding fcntl( O_SYNC ) not working on linux.

In addition, the question you ask implies a potential problem. What do you mean by writing to the disk? Why does it matter? Are you concerned that the power goes out and the file is missing from the drive? Why not use a UPS on the system or the SAN?

In that case you need a journaling file system - and not just a meta-data journaling file system but a full journal even for all the data.

Even in that case you must understand that besides the O/S's involvment, most hard disks lie to you about doing an fsync. - fsync just sends the data to the drive, and it is up to the individual operating system to know how to wait for the drive to flush its own caches.

--jeffk++