Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will Python open a file before it's finished writing?

Tags:

python

file

linux

I am writing a script that will be polling a directory looking for new files.

In this scenario, is it necessary to do some sort of error checking to make sure the files are completely written prior to accessing them?

I don't want to work with a file before it has been written completely to disk, but because the info I want from the file is near the beginning, it seems like it could be possible to pull the data I need without realizing the file isn't done being written.

Is that something I should worry about, or will the file be locked because the OS is writing to the hard drive?

This is on a Linux system.

like image 955
CasualScience Avatar asked Dec 27 '22 19:12

CasualScience


2 Answers

Typically on Linux, unless you're using locking of some kind, two processes can quite happily have the same file open at once, even for writing. There are three ways of avoiding problems with this:

  1. Locking

    By having the writer apply a lock to the file, it is possible to prevent the reader from reading the file partially. However, most locks are advisory so it is still entirely possible to see partial results anyway. (Mandatory locks exist, but a strongly not recommended on the grounds that they're far too fragile.) It's relatively difficult to write correct locking code, and it is normal to delegate such tasks to a specialist library (i.e., to a database engine!) In particular, you don't want to use locking on networked filesystems; it's a source of colossal trouble when it works and can often go thoroughly wrong.

  2. Convention

    A file can instead be created in the same directory with another name that you don't automatically look for on the reading side (e.g., .foobar.txt.tmp) and then renamed atomically to the right name (e.g., foobar.txt) once the writing is done. This can work quite well, so long as you take care to deal with the possibility of previous runs failing to correctly write the file. If there should only ever be one writer at a time, this is fairly simple to implement.

  3. Not Worrying About It

    The most common type of file that is frequently written is a log file. These can be easily written in such a way that information is strictly only ever appended to the file, so any reader can safely look at the beginning of the file without having to worry about anything changing under its feet. This works very well in practice.

There's nothing special about Python in any of this. All programs running on Linux have the same issues.

like image 111
Donal Fellows Avatar answered Jan 04 '23 16:01

Donal Fellows


On Unix, unless the writing application goes out of its way, the file won't be locked and you'll be able to read from it.

The reader will, of course, have to be prepared to deal with an incomplete file (bearing in mind that there may be I/O buffering happening on the writer's side).

If that's a non-starter, you'll have to think of some scheme to synchronize the writer and the reader, for example:

  • explicitly lock the file;
  • write the data to a temporary location and only move it into its final place when the file is complete (the move operation can be done atomically, provided both the source and the destination reside on the same file system).
like image 24
NPE Avatar answered Jan 04 '23 17:01

NPE