Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I asynchronously delete a file in Python?

I have a long running python script which creates and deletes temporary files. I notice there is a non-trivial amount of time spent on file deletion, but the only purpose of deleting those files is to ensure that the program doesn't eventually fill up all the disk space during a long run. Is there a cross platform mechanism in Python to aschyronously delete a file so the main thread can continue to work while the OS takes care of the file delete?

like image 402
Rich Avatar asked Sep 27 '13 17:09

Rich


People also ask

How do I remove files from a directory in Python?

In Python, you can use the os. remove() method to remove files, and the os. rmdir() method to delete an empty folder. If you want to delete a folder with all of its files, you can use the shutil.

What is Aiofiles?

aiofiles is an Apache2 licensed library, written in Python, for handling local disk files in asyncio applications. Ordinary local file IO is blocking, and cannot easily and portably made asynchronous. This means doing file IO may interfere with asyncio applications, which shouldn't block the executing thread.

Which method is used to delete a directory in Python?

remove() method in Python is used to remove or delete a file path.


2 Answers

You can try delegating deleting the files to another thread or process.

Using a newly spawned thread:

thread.start_new_thread(os.remove, filename)

Or, using a process:

# create the process pool once
process_pool = multiprocessing.Pool(1)
results = []

# later on removing a file in async fashion
# note: need to hold on to the async result till it has completed
results.append(process_pool.apply_async(os.remove, filename), callback=lambda result: results.remove(result))

The process version may allow for more parallelism because Python threads are not executing in parallel due to the notorious global interpreter lock. I would expect though that GIL is released when it calls any blocking kernel function, such as unlink(), so that Python lets another thread to make progress. In other words, a background worker thread that calls os.unlink() may be the best solution, see Tim Peters' answer.

Yet, multiprocessing is using Python threads underneath to asynchronously communicate with the processes in the pool, so some benchmarking is required to figure which version gives more parallelism.

An alternative method to avoid using Python threads but requires more coding is to spawn another process and send the filenames to its standard input through a pipe. This way you trade os.remove() to a synchronous os.write() (one write() syscall). It can be done using deprecated os.popen() and this usage of the function is perfectly safe because it only communicates in one direction to the child process. A working prototype:

#!/usr/bin/python

from __future__ import print_function
import os, sys

def remover():
    for line in sys.stdin:
        filename = line.strip()
        try:
            os.remove(filename)
        except Exception: # ignore errors
            pass

def main():
    if len(sys.argv) == 2 and sys.argv[1] == '--remover-process':
        return remover()

    remover_process = os.popen(sys.argv[0] + ' --remover-process', 'w')
    def remove_file(filename):
        print(filename, file=remover_process)
        remover_process.flush()

    for file in sys.argv[1:]:
        remove_file(file)

if __name__ == "__main__":
    main()
like image 101
Maxim Egorushkin Avatar answered Sep 20 '22 11:09

Maxim Egorushkin


You can create a thread to delete files, following a common producer-consumer pattern:

import threading, Queue

dead_files = Queue.Queue()
END_OF_DATA = object() # a unique sentinel value

def background_deleter():
    import os
    while True:
        path = dead_files.get()
        if path is END_OF_DATA:
            return
        try:
            os.remove(path)
        except:  # add the exceptions you want to ignore here
            pass # or log the error, or whatever

deleter = threading.Thread(target=background_deleter)
deleter.start()

# when you want to delete a file, do:
# dead_files.put(file_path)

# when you want to shut down cleanly,
dead_files.put(END_OF_DATA)
deleter.join()

CPython releases the GIL (global interpreter lock) around internal file deletion calls, so this should be effective.

Edit - new text

I would advise against spawning a new process per delete. On some platforms, process creation is quite expensive. Would also advise against spawning a new thread per delete: in a long-running program, you really never want the possibility of creating an unbounded number of threads at any point. Depending on how quickly file deletion requests pile up, that could happen here.

The "solution" above is wordier than the others, because it avoids all that. There's only one new thread total. Of course it could easily be generalized to use any fixed number of threads instead, all sharing the same dead_files queue. Start with 1, add more if needed ;-)

like image 34
Tim Peters Avatar answered Sep 20 '22 11:09

Tim Peters