Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A safe, atomic file-copy operation

I need to copy a file from one location to another, and I need to throw an exception (or at least somehow recognise) if the file already exists at the destination (no overwriting).

I can check first with os.path.exists() but it's extremely important that the file cannot be created in the small amount of time between checking and copying.

Is there a built-in way of doing this, or is there a way to define an action as atomic?

like image 270
Ivy Avatar asked Jul 23 '12 14:07

Ivy


People also ask

What is an atomic file operation?

Atomic Operations Several Files methods, such as move , can perform certain operations atomically in some file systems. An atomic file operation is an operation that cannot be interrupted or "partially" performed. Either the entire operation is performed or the operation fails.

Is Shutil copy Atomic?

No, it seems to just loop, reading and writing 16KB at a time. For an atomic copy operation, you should copy the file to a different location on the same filesystem, and then os. rename() it to the desired location (which is guaranteed to be atomic on Linux).

Is renaming a file Atomic?

In POSIX, a successful call to rename is guaranteed to have been atomic from the point of view of the current host (i.e., another program would only see the file with the old name or the file with the new name, not both or neither of them).

What is atomic file update?

An atomic operation is one that changes a system from one state to another without visibly passing through any intermediate states. Atomicity is desirable when altering the content of a file because: The process performing the alteration may fail or be stopped, leaving the file in an incomplete or inconsistent state.


1 Answers

There is in fact a way to do this, atomically and safely, provided all actors do it the same way. It's an adaptation of the lock-free whack-a-mole algorithm, and not entirely trivial, so feel free to go with "no" as the general answer ;)

What to do

  1. Check whether the file already exists. Stop if it does.
  2. Generate a unique ID
  3. Copy the source file to the target folder with a temporary name, say, <target>.<UUID>.tmp.
  4. Rename the copy <target>-<UUID>.mole.tmp.
  5. Look for any other files matching the pattern <target>-*.mole.tmp.
    • If their UUID compares greater than yours, attempt to delete it. (Don't worry if it's gone.)
    • If their UUID compares less than yours, attempt to delete your own. (Again, don't worry if it's gone.) From now on, treat their UUID as if it were your own.
  6. Check again to see if the destination file already exists. If so, attempt to delete your temporary file. (Don't worry if it's gone. Remember your UUID may have changed in step 5.)
  7. If you didn't already attempt to delete it in step 6, attempt to rename your temporary file to its final name, <target>. (Don't worry if it's gone, just jump back to step 5.)

You're done!

How it works

Imagine each candidate source file is a mole coming out of its hole. Half-way out, it pauses and whacks any competing moles back into the ground, before checking no other mole has fully emerged. If you run this through in your head, you should see that only one mole will ever make it all the way out. To prevent this system from livelocking, we add a total ordering on which mole can whack which. Bam! A  PhD thesis  lock-free algorithm.

Step 4 may look unnecessary—why not just use that name in the first place? However, another process may "adopt" your  mole  file in step 5, and make it the winner in step 7, so it's very important that you're not still writing out the contents! Renames on the same file system are atomic, so step 4 is safe.

like image 62
Alice Purcell Avatar answered Sep 19 '22 16:09

Alice Purcell