I have a long-running process which writes a lot of stuff in a file. The result should be everything or nothing, so I'm writing to a temporary file and rename it to the real name at the end. Currently, my code is like this:
filename = 'whatever'
tmpname = 'whatever' + str(time.time())
with open(tmpname, 'wb') as fp:
fp.write(stuff)
fp.write(more stuff)
if os.path.exists(filename):
os.unlink(filename)
os.rename(tmpname, filename)
I'm not happy with that for several reasons:
Any suggestions how to improve my code? Is there a library that can help me out?
You can use Python's tempfile
module to give you a temporary file name. It can create a temporary file in a thread safe manner rather than making one up using time.time()
which may return the same name if used in multiple threads at the same time.
As suggested in a comment to your question, this can be coupled with the use of a context manager. You can get some ideas of how to implement what you want to do by looking at Python tempfile.py
sources.
The following code snippet may do what you want. It uses some of the internals of the objects returned from tempfile
.
os.path.exists()
and the os.rename()
which could introduce a race condition. For an atomic rename on Linux the source and destinations must be on the same file system which is why this code places the temporary file in the same directory as the destination file.RenamedTemporaryFile
class should behave like a NamedTemporaryFile
for most purposes except when it is closed using the context manager, the file is renamed.Sample:
import tempfile
import os
class RenamedTemporaryFile(object):
"""
A temporary file object which will be renamed to the specified
path on exit.
"""
def __init__(self, final_path, **kwargs):
tmpfile_dir = kwargs.pop('dir', None)
# Put temporary file in the same directory as the location for the
# final file so that an atomic move into place can occur.
if tmpfile_dir is None:
tmpfile_dir = os.path.dirname(final_path)
self.tmpfile = tempfile.NamedTemporaryFile(dir=tmpfile_dir, **kwargs)
self.final_path = final_path
def __getattr__(self, attr):
"""
Delegate attribute access to the underlying temporary file object.
"""
return getattr(self.tmpfile, attr)
def __enter__(self):
self.tmpfile.__enter__()
return self
def __exit__(self, exc_type, exc_val, exc_tb):
if exc_type is None:
self.tmpfile.delete = False
result = self.tmpfile.__exit__(exc_type, exc_val, exc_tb)
os.rename(self.tmpfile.name, self.final_path)
else:
result = self.tmpfile.__exit__(exc_type, exc_val, exc_tb)
return result
You can then use it like this:
with RenamedTemporaryFile('whatever') as f:
f.write('stuff')
During writing, the contents go to a temporary file, on exit the file is renamed. This code will probably need some tweaks but the general idea should help you get started.
To write all or nothing to a file reliably:
import os
from contextlib import contextmanager
from tempfile import NamedTemporaryFile
if not hasattr(os, 'replace'):
os.replace = os.rename #NOTE: it won't work for existing files on Windows
@contextmanager
def FaultTolerantFile(name):
dirpath, filename = os.path.split(name)
# use the same dir for os.rename() to work
with NamedTemporaryFile(dir=dirpath, prefix=filename, suffix='.tmp') as f:
yield f
f.flush() # libc -> OS
os.fsync(f) # OS -> disc (note: on OSX it is not enough)
f.delete = False # don't delete tmp file if `replace()` fails
f.close()
os.replace(f.name, name)
See also Is rename() without fsync() safe? (mentioned by @Mihai Stan)
with FaultTolerantFile('very_important_file') as file:
file.write('either all ')
file.write('or nothing is written')
To implement missing os.replace()
you could call MoveFileExW(src, dst, MOVEFILE_REPLACE_EXISTING)
(via win32file or ctypes modules) on Windows.
In case of multiple threads you could call queue.put(data)
from
different threads and write to file in a dedicated thread:
for data in iter(queue.get, None):
file.write(data)
queue.put(None)
breaks the loop.
As an alternative you could use locks (threading, multiprocessing, filelock) to synchronize access:
def write(self, data):
with self.lock:
self.file.write(data)
The with
construct is useful for cleaning up on exit, but not for the commit/rollback system you want. A try/except/else block can be used for that.
You also should use a standard way for creating the temporary file name, for example with the tempfile module.
And remember to fsync before rename
Below is the full modified code:
import time, os, tempfile
def begin_file(filepath):
(filedir, filename) = os.path.split(filepath)
tmpfilepath = tempfile.mktemp(prefix=filename+'_', dir=filedir)
return open(os.path.join(filedir, tmpfilepath), 'wb')
def commit_file(f):
tmppath = f.name
(filedir, tmpname) = os.path.split(tmppath)
origpath = os.path.join(filedir,tmpname.split('_')[0])
os.fsync(f.fileno())
f.close()
if os.path.exists(origpath):
os.unlink(origpath)
os.rename(tmppath, origpath)
def rollback_file(f):
tmppath = f.name
f.close()
os.unlink(tmppath)
fp = begin_file('whatever')
try:
fp.write('stuff')
except:
rollback_file(fp)
raise
else:
commit_file(fp)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With