Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Threadsafe and fault-tolerant file writes

I have a long-running process which writes a lot of stuff in a file. The result should be everything or nothing, so I'm writing to a temporary file and rename it to the real name at the end. Currently, my code is like this:

filename = 'whatever'
tmpname = 'whatever' + str(time.time())

with open(tmpname, 'wb') as fp:
    fp.write(stuff)
    fp.write(more stuff)

if os.path.exists(filename):
    os.unlink(filename)
os.rename(tmpname, filename)

I'm not happy with that for several reasons:

  • it doesn't clean up properly if an exception occurs
  • it ignores concurrency issues
  • it isn't reusable (I need this in different places in my program)

Any suggestions how to improve my code? Is there a library that can help me out?

like image 598
georg Avatar asked Aug 17 '12 10:08

georg


3 Answers

You can use Python's tempfile module to give you a temporary file name. It can create a temporary file in a thread safe manner rather than making one up using time.time() which may return the same name if used in multiple threads at the same time.

As suggested in a comment to your question, this can be coupled with the use of a context manager. You can get some ideas of how to implement what you want to do by looking at Python tempfile.py sources.

The following code snippet may do what you want. It uses some of the internals of the objects returned from tempfile.

  • Creation of temporary files is thread safe.
  • Renaming of files upon successful completion is atomic, at least on Linux. There isn't a separate check between os.path.exists() and the os.rename() which could introduce a race condition. For an atomic rename on Linux the source and destinations must be on the same file system which is why this code places the temporary file in the same directory as the destination file.
  • The RenamedTemporaryFile class should behave like a NamedTemporaryFile for most purposes except when it is closed using the context manager, the file is renamed.

Sample:

import tempfile
import os

class RenamedTemporaryFile(object):
    """
    A temporary file object which will be renamed to the specified
    path on exit.
    """
    def __init__(self, final_path, **kwargs):
        tmpfile_dir = kwargs.pop('dir', None)

        # Put temporary file in the same directory as the location for the
        # final file so that an atomic move into place can occur.

        if tmpfile_dir is None:
            tmpfile_dir = os.path.dirname(final_path)

        self.tmpfile = tempfile.NamedTemporaryFile(dir=tmpfile_dir, **kwargs)
        self.final_path = final_path

    def __getattr__(self, attr):
        """
        Delegate attribute access to the underlying temporary file object.
        """
        return getattr(self.tmpfile, attr)

    def __enter__(self):
        self.tmpfile.__enter__()
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        if exc_type is None:
            self.tmpfile.delete = False
            result = self.tmpfile.__exit__(exc_type, exc_val, exc_tb)
            os.rename(self.tmpfile.name, self.final_path)
        else:
            result = self.tmpfile.__exit__(exc_type, exc_val, exc_tb)

        return result

You can then use it like this:

with RenamedTemporaryFile('whatever') as f:
    f.write('stuff')

During writing, the contents go to a temporary file, on exit the file is renamed. This code will probably need some tweaks but the general idea should help you get started.

like image 68
Austin Phillips Avatar answered Nov 13 '22 17:11

Austin Phillips


To write all or nothing to a file reliably:

import os
from contextlib import contextmanager
from tempfile   import NamedTemporaryFile

if not hasattr(os, 'replace'):
    os.replace = os.rename #NOTE: it won't work for existing files on Windows

@contextmanager
def FaultTolerantFile(name):
    dirpath, filename = os.path.split(name)
    # use the same dir for os.rename() to work
    with NamedTemporaryFile(dir=dirpath, prefix=filename, suffix='.tmp') as f:
        yield f
        f.flush()   # libc -> OS
        os.fsync(f) # OS -> disc (note: on OSX it is not enough)
        f.delete = False # don't delete tmp file if `replace()` fails
        f.close()
        os.replace(f.name, name)

See also Is rename() without fsync() safe? (mentioned by @Mihai Stan)

Usage

with FaultTolerantFile('very_important_file') as file:
    file.write('either all ')
    file.write('or nothing is written')

To implement missing os.replace() you could call MoveFileExW(src, dst, MOVEFILE_REPLACE_EXISTING) (via win32file or ctypes modules) on Windows.

In case of multiple threads you could call queue.put(data) from different threads and write to file in a dedicated thread:

 for data in iter(queue.get, None):
     file.write(data)

queue.put(None) breaks the loop.

As an alternative you could use locks (threading, multiprocessing, filelock) to synchronize access:

def write(self, data):
    with self.lock:
        self.file.write(data)
like image 29
jfs Avatar answered Nov 13 '22 16:11

jfs


The with construct is useful for cleaning up on exit, but not for the commit/rollback system you want. A try/except/else block can be used for that.

You also should use a standard way for creating the temporary file name, for example with the tempfile module.

And remember to fsync before rename

Below is the full modified code:

import time, os, tempfile

def begin_file(filepath):
    (filedir, filename) = os.path.split(filepath)
    tmpfilepath = tempfile.mktemp(prefix=filename+'_', dir=filedir)
    return open(os.path.join(filedir, tmpfilepath), 'wb') 

def commit_file(f):
    tmppath = f.name
    (filedir, tmpname) = os.path.split(tmppath)
    origpath = os.path.join(filedir,tmpname.split('_')[0])

    os.fsync(f.fileno())
    f.close()

    if os.path.exists(origpath):
        os.unlink(origpath)
    os.rename(tmppath, origpath)

def rollback_file(f):
    tmppath = f.name
    f.close()
    os.unlink(tmppath)


fp = begin_file('whatever')
try:
    fp.write('stuff')
except:
    rollback_file(fp)
    raise
else:
    commit_file(fp)
like image 2
Mihai Stan Avatar answered Nov 13 '22 16:11

Mihai Stan