Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python file objects, closing, and destructors

The description of tempfile.NamedTemporaryFile() says:

If delete is true (the default), the file is deleted as soon as it is closed.

In some circumstances, this means that the file is not deleted after the Python interpreter ends. For example, when running the following test under py.test, the temporary file remains:

from __future__ import division, print_function, absolute_import
import tempfile
import unittest2 as unittest
class cache_tests(unittest.TestCase):
    def setUp(self):
        self.dbfile = tempfile.NamedTemporaryFile()
    def test_get(self):
        self.assertEqual('foo', 'foo')

In some way this makes sense, because this program never explicitly closes the file object. The only other way for the object to get closed would presumably be in the __del__ destructor, but here the language references states that "It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits." So everything is consistent with the documentation so far.

However, I'm confused about the implications of this. If it is not guaranteed that file objects are closed on interpreter exit, can it possibly happen that some data that was successfully written to a (buffered) file object is lost even though the program exits gracefully, because it was still in the file object's buffer, and the file object never got closed?

Somehow that seems very unlikely and un-pythonic to me, and the open() documentation doesn't contain any such warnings either. So I (tentatively) conclude that file objects are, after all, guaranteed to be closed.

But how does this magic happen, and why can't NamedTemporaryFile() use the same magic to ensure that the file is deleted?

Edit: Note that I am not talking about file descriptors here (that are buffered by the OS and closed by the OS on program exit), but about Python file objects that may implement their own buffering.

like image 993
Nikratio Avatar asked Apr 10 '13 05:04

Nikratio


People also ask

Do Python objects have destructors?

A destructor method is called when all references to an object have been destroyed. In Python, the __del__() method is referred to as a destructor method. Destructors aren't as important in Python as they are in C++, because Python contains a garbage collector that takes care of memory management automatically.

What does __ del __ do in Python?

The __del__() method is a known as a destructor method. It is called when an object is garbage collected which happens after all references to the object have been deleted.

How are objects destroyed Python?

You cannot manually destroy objects in Python, Python uses automatic memory management. When an object is no longer referenced, it is free to be garbage collected, in CPython, which uses reference counting, when a reference count reaches zero, an object is reclaimed immediately.

Why should you close file objects in Python?

You've learned why it's important to close files in Python. Because files are limited resources managed by the operating system, making sure files are closed after use will protect against hard-to-debug issues like running out of file handles or experiencing corrupted data.


2 Answers

On Windows, NamedTemporaryFile uses a Windows-specific extension (os.O_TEMPORARY) to ensure that the file is deleted when it is closed. This probably also works if the process is killed in any way. However there is no obvious equivalent on POSIX, most likely because on POSIX you can simply delete files that are still in use; it only deletes the name, and the file's content is only removed after it is closed (in any way). But indeed assuming that we want the file name to persist until the file is closed, like with NamedTemporaryFile, then we need "magic".

We cannot use the same magic as for flushing buffered files. What occurs there is that the C library handles it (in Python 2): the files are FILE objects in C, and the C guarantees that they are flushed on normal program exit (but not if the process is killed). In the case of Python 3, there is custom C code to achieve the same effect. But it's specific to this use case, not anything directly reusable.

That's why NamedTemporaryFile uses a custom __del__. And indeed, __del__ are not guaranteed to be called when the interpreter exits. (We can prove it with a global cycle of references that also references a NamedTemporaryFile instance; or running PyPy instead of CPython.)

As a side note, NamedTemporaryFile could be implemented a bit more robustly, e.g. by registering itself with atexit to ensure that the file name is removed then. But you can call it yourself too: if your process doesn't use an unbounded number of NamedTemporaryFiles, it's simply atexit.register(my_named_temporary_file.close).

like image 175
Armin Rigo Avatar answered Sep 22 '22 10:09

Armin Rigo


On any version of *nix, all file descriptors are closed when a process finishes, and this is taken care of by the operating system. Windows is likely exactly the same in this respect. Without digging in the source code, I can't say with 100% authority what actually happens, but likely what happens is:

  • If delete is False, unlink() (or a function similar to it on other operating systems) is called. This means that the file will automatically be deleted when the process exits and there are no more open file descriptors. While the process is running, the file will still remain around.

  • If delete is True, likely the C function remove() is used. This will forcibly delete the file before the process exits.

like image 45
Yuushi Avatar answered Sep 19 '22 10:09

Yuushi