Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you check when a file is done being copied in Python?

I'd like to figure out a way to alert a python script that a file is done copying. Here is the scenario:

  1. A folder, to_print is being watched by the script by constantly polling with os.listdir().

  2. Every time os.listdir() returns a list of files in which a file exists that hasn't been seen before, the script performs some operations on that file, which include opening it and manipulating its contents.

This is fine when the file is small, and copying the file from its original source to the directory being watched takes less time than the amount of time remaining until the next poll by os.listdir(). However, if a file is polled and found, but it is still in the process of being copied, then the file contents are corrupt when the script tries to act on it.

Instead, I'd like to be able to (using os.stat or otherwise) know that a file is currently being copied, and wait for it to be done until I act on it if so.

My current idea is to use os.stat() every time I find a new file, then wait until the next poll and compare the date modified/created time since the last time I polled, and if they remain the same then that file is "stable", otherwise keep polling until it is. I'm not sure this will work though as I am not too familiar with how Linux/Unix updates these values.

like image 380
emish Avatar asked Oct 10 '12 16:10

emish


1 Answers

Try inotify.

This is a Linux standard for watching files. For your use-case the event IN_CLOSE_WRITE seems to be promising. There is a Python library for inotify. A very simple example (taken from there). You'll need to modify it to catch only IN_CLOSE_WRITE events.

# Example: loops monitoring events forever.
#
import pyinotify

# Instanciate a new WatchManager (will be used to store watches).

wm = pyinotify.WatchManager()
# Associate this WatchManager with a Notifier (will be used to report and
# process events).

notifier = pyinotify.Notifier(wm)
# Add a new watch on /tmp for ALL_EVENTS.
wm.add_watch('/tmp', pyinotify.ALL_EVENTS) # <-- replace by IN_CLOSE_WRITE

# Loop forever and handle events.
notifier.loop()

Here is an extensive API documentation: http://seb-m.github.com/pyinotify/

like image 56
nalply Avatar answered Oct 03 '22 18:10

nalply