Is it safe to call rename(tmppath, path)
without calling fsync(tmppath_fd)
first?
I want the path to always point to a complete file. I care mainly about Ext4. Is the rename() promised to be safe in all future Linux kernel versions?
A usage example in Python:
def store_atomically(path, data): tmppath = path + ".tmp" output = open(tmppath, "wb") output.write(data) output.flush() os.fsync(output.fileno()) # The needed fsync(). output.close() os.rename(tmppath, path)
No.
Look at libeatmydata, and this presentation:
http://www.oscon.com/oscon2008/public/schedule/detail/3172
by Stewart Smith from MySql.
In case it is offline/no longer available, I keep a copy of it:
From ext4 documentation:
When mounting an ext4 filesystem, the following option are accepted: (*) == default auto_da_alloc(*) Many broken applications don't use fsync() when noauto_da_alloc replacing existing files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/ rename("foo.new", "foo"), or worse yet, fd = open("foo", O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled, ext4 will detect the replace-via-rename and replace-via-truncate patterns and force that any delayed allocation blocks are allocated such that at the next journal commit, in the default data=ordered mode, the data blocks of the new file are forced to disk before the rename() operation is committed. This provides roughly the same level of guarantees as ext3, and avoids the "zero-length" problem that can happen when a system crashes before the delayed allocation blocks are forced to disk.
Judging by the wording "broken applications", it is definitely considered bad practice by the ext4 developers, but in practice it is so widely used approach that it was patched in ext4 itself.
So if your usage fits the pattern, you should be safe.
If not, I suggest you to investigate further instead of inserting fsync
here and there just to be safe. That might not be such a good idea since fsync
can be a major performance hit on ext3 (read).
On the other hand, flushing before rename is the correct way to do the replacement on non-journaling file systems. Maybe that's why ext4 at first expected this behavior from programs, the auto_da_alloc
option was added later as a fix. Also this ext3 patch for the writeback (non-journaling) mode tries to help the careless programs by flushing asynchronously on rename to lower the chance of data loss.
You can read more about the ext4 problem here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With