Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Python, how to make atomic rewrite of a file WITHOUT renaming it?

In Python, how I can safely overwrite a file without renaming it?

There is a question on SO:

  • How to safely write to a file?

where this topic is discussed, but provided solutions can't help me, because in my case I have many hardlinks pointing to the file being overwritten.

Is there any other method that can guarantee me atomic change of the file (without renaming it)?

Thank you very much!

like image 829
Igor Chubin Avatar asked Oct 12 '25 05:10

Igor Chubin


1 Answers

Python gives you access to underlying OS tools. Please review Atomic operations in UNIX.

Overall you have two requirements, atomicity and support for hard links. Also the referred answer mentions safety.

First is very narrowly satisfiable, but only if you drop safety, typically you'd use POSIX advisory locks, if every client uses these, you can have a very robust system, for example sqlite.

Mandatory locking is available, but not commonly enabled. Main sticking point with mandatory locks is priority inversion, that is non-privileged user can block a root process if they access same file.

Hard links implies you have to work on inode level. Any function in the above reference that operates on a file descriptor will work.

Atomic but not safe

A single write system call is atomic up to a certain filesystem-dependent threshold. If you can afford to buffer your file data in memory (anonymous or mapped), you can atomically overwrite the file. For the sake of simplicity let's assume the file size is fixed.

Consider code below, it when two processes perform this action simultaneously, both writes start at offset 0, run in a single system call and in the end only one write "wins".

#!/usr/bin/env python
import sys

data = open(sys.argv[1], "rb").read()
fo = open(sys.argv[2], "rb+")
fo.seek(0)
fo.write(data)

While this is atomic, it is not inherently safe. write could turn out to be partial (typically only if disk is full), or operating system could crash during write, leaving you with a target file that is neither source a nor b. If that's acceptable because you made a backup, do ahead and use it :)

P.S. If file size if not fixed, adopt a file format where file header specifies data size if the file.

P.P.S. Although sendfile system call now works on regular files for both input and output, testing shows that operation is not atomic, here one thread tried to send 1000M zeros and another 1000M ff's, the result is exactly 1000M but data gets interleaved, return value of one sendfiles shows partial write, but size if inconsistent with actual zeros written:

(env33)[dima@bmg ~]$ hexdump oux 
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
03c0000 0000 0000 0000 0000 0000 0000 0000 0000
*
3e800000
like image 175
Dima Tisnek Avatar answered Oct 14 '25 20:10

Dima Tisnek