I need to copy a file from one location to another, and I need to throw an exception (or at least somehow recognise) if the file already exists at the destination (no overwriting). I can check first with os.path.exists() but it's extremely important that the file cannot be created in the small amount of time between checking and copying. Is there a built-in way of doing this, or is there a way to define an action as atomic?

There is in fact a way to do this, atomically and safely, provided all actors do it the same way. It's an adaptation of the lock-free whack-a-mole algorithm, and not entirely trivial, so feel free to go with "no" as the general answer ;) <h3>What to do</h3> <ol> <li>Check whether the file already exists. Stop if it does.</li> <li>Generate a unique ID</li> <li>Copy the source file to the target folder with a temporary name, say, <code><target>.<UUID>.tmp</code>.</li> <li>Rename&dagger; the copy <code><target>-<UUID>.mole.tmp</code>.</li> <li> Look for any other files matching the pattern <code><target>-*.mole.tmp</code>. <ul> <li>If their UUID compares greater than yours, attempt to delete it. (Don't worry if it's gone.)</li> <li>If their UUID compares less than yours, attempt to delete your own. (Again, don't worry if it's gone.) From now on, treat their UUID as if it were your own.</li> </ul> </li> <li>Check again to see if the destination file already exists. If so, attempt to delete your temporary file. (Don't worry if it's gone. Remember your UUID may have changed in step 5.)</li> <li>If you didn't already attempt to delete it in step 6, attempt to rename your temporary file to its final name, <code><target></code>. (Don't worry if it's gone, just jump back to step 5.)</li> </ol> You're done! <h3>How it works</h3> Imagine each candidate source file is a mole coming out of its hole. Half-way out, it pauses and whacks any competing moles back into the ground, before checking no other mole has fully emerged. If you run this through in your head, you should see that only one mole will ever make it all the way out. To prevent this system from livelocking, we add a total ordering on which mole can whack which. Bam! A <strike> PhD thesis </strike> lock-free algorithm. &dagger; Step 4 may look unnecessary—why not just use that name in the first place? However, another process may "adopt" your <strike> mole </strike> file in step 5, and make it the winner in step 7, so it's very important that you're not still writing out the contents! Renames on the same file system are atomic, so step 4 is safe.

A safe, atomic file-copy operation

1 Answers

There is in fact a way to do this, atomically and safely, provided all actors do it the same way. It's an adaptation of the lock-free whack-a-mole algorithm, and not entirely trivial, so feel free to go with "no" as the general answer ;)

What to do

Check whether the file already exists. Stop if it does.
Generate a unique ID
Copy the source file to the target folder with a temporary name, say, <target>.<UUID>.tmp.
Rename^† the copy <target>-<UUID>.mole.tmp.
Look for any other files matching the pattern <target>-*.mole.tmp.
- If their UUID compares greater than yours, attempt to delete it. (Don't worry if it's gone.)
- If their UUID compares less than yours, attempt to delete your own. (Again, don't worry if it's gone.) From now on, treat their UUID as if it were your own.
Check again to see if the destination file already exists. If so, attempt to delete your temporary file. (Don't worry if it's gone. Remember your UUID may have changed in step 5.)
If you didn't already attempt to delete it in step 6, attempt to rename your temporary file to its final name, <target>. (Don't worry if it's gone, just jump back to step 5.)

You're done!

How it works

Imagine each candidate source file is a mole coming out of its hole. Half-way out, it pauses and whacks any competing moles back into the ground, before checking no other mole has fully emerged. If you run this through in your head, you should see that only one mole will ever make it all the way out. To prevent this system from livelocking, we add a total ordering on which mole can whack which. Bam! A ~~PhD thesis~~ lock-free algorithm.

^† Step 4 may look unnecessary—why not just use that name in the first place? However, another process may "adopt" your ~~mole~~ file in step 5, and make it the winner in step 7, so it's very important that you're not still writing out the contents! Renames on the same file system are atomic, so step 4 is safe.

answered Sep 19 '22 16:09

Alice Purcell

Related questions
                            
                                Does Kafka python API support stream processing?
                            
                                Django one of 2 fields must not be null
                            
                                Ansible + Ubuntu 18.04 + MySQL = "The PyMySQL (Python 2.7 and Python 3.X) or MySQL-python (Python 2.X) module is required."
                            
                                What is the difference between MaxPool and MaxPooling layers in Keras?
                            
                                Determine if a named parameter was passed
                            
                                Embedding icon in .exe with py2exe, visible in Vista?
                            
                                Regular expression implementation details
                            
                                Using python to develop web application
                            
                                tokenize a string keeping delimiters in Python
                            
                                How to plot data against specific dates on the x-axis using matplotlib
                            
                                Can I use cStringIO the same as StringIO?
                            
                                Element-wise power of scipy.sparse matrix
                            
                                Should I create pipeline to save files with scrapy?
                            
                                Unique combination of fields in SQLite?
                            
                                How to fix pylint warning "Abstract class not referenced"?
                            
                                How redirect a shell command output to a Python script input ?
                            
                                Docstrings - one line vs multiple line
                            
                                pylab matplotlib "show" waits until window closes
                            
                                How to correct "TypeError: 'NoneType' object is not subscriptable" in recursive function?
                            
                                Python leading underscore _variables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

A safe, atomic file-copy operation

Tags:

python

concurrency

Ivy

People also ask

1 Answers

What to do

How it works

Alice Purcell

Recent Activity

Donate For Us