The description of tempfile.NamedTemporaryFile()
says:
If delete is true (the default), the file is deleted as soon as it is closed.
In some circumstances, this means that the file is not deleted after the
Python interpreter ends. For example, when running the following test under
py.test
, the temporary file remains:
from __future__ import division, print_function, absolute_import
import tempfile
import unittest2 as unittest
class cache_tests(unittest.TestCase):
def setUp(self):
self.dbfile = tempfile.NamedTemporaryFile()
def test_get(self):
self.assertEqual('foo', 'foo')
In some way this makes sense, because this program never explicitly
closes the file object. The only other way for the object to get closed
would presumably be in the __del__
destructor, but here the language
references states that "It is not guaranteed that __del__()
methods are
called for objects that still exist when the interpreter exits." So
everything is consistent with the documentation so far.
However, I'm confused about the implications of this. If it is not guaranteed that file objects are closed on interpreter exit, can it possibly happen that some data that was successfully written to a (buffered) file object is lost even though the program exits gracefully, because it was still in the file object's buffer, and the file object never got closed?
Somehow that seems very unlikely and un-pythonic to me, and the open() documentation doesn't contain any such warnings either. So I (tentatively) conclude that file objects are, after all, guaranteed to be closed.
But how does this magic happen, and why can't NamedTemporaryFile()
use
the same magic to ensure that the file is deleted?
Edit: Note that I am not talking about file descriptors here (that are buffered by the OS and closed by the OS on program exit), but about Python file objects that may implement their own buffering.
A destructor method is called when all references to an object have been destroyed. In Python, the __del__() method is referred to as a destructor method. Destructors aren't as important in Python as they are in C++, because Python contains a garbage collector that takes care of memory management automatically.
The __del__() method is a known as a destructor method. It is called when an object is garbage collected which happens after all references to the object have been deleted.
You cannot manually destroy objects in Python, Python uses automatic memory management. When an object is no longer referenced, it is free to be garbage collected, in CPython, which uses reference counting, when a reference count reaches zero, an object is reclaimed immediately.
You've learned why it's important to close files in Python. Because files are limited resources managed by the operating system, making sure files are closed after use will protect against hard-to-debug issues like running out of file handles or experiencing corrupted data.
On Windows, NamedTemporaryFile uses a Windows-specific extension (os.O_TEMPORARY) to ensure that the file is deleted when it is closed. This probably also works if the process is killed in any way. However there is no obvious equivalent on POSIX, most likely because on POSIX you can simply delete files that are still in use; it only deletes the name, and the file's content is only removed after it is closed (in any way). But indeed assuming that we want the file name to persist until the file is closed, like with NamedTemporaryFile, then we need "magic".
We cannot use the same magic as for flushing buffered files. What occurs there is that the C library handles it (in Python 2): the files are FILE objects in C, and the C guarantees that they are flushed on normal program exit (but not if the process is killed). In the case of Python 3, there is custom C code to achieve the same effect. But it's specific to this use case, not anything directly reusable.
That's why NamedTemporaryFile uses a custom __del__
. And indeed, __del__
are not guaranteed to be called when the interpreter exits. (We can prove it with a global cycle of references that also references a NamedTemporaryFile instance; or running PyPy instead of CPython.)
As a side note, NamedTemporaryFile could be implemented a bit more robustly, e.g. by registering itself with atexit
to ensure that the file name is removed then. But you can call it yourself too: if your process doesn't use an unbounded number of NamedTemporaryFiles, it's simply atexit.register(my_named_temporary_file.close)
.
On any version of *nix, all file descriptors are closed when a process finishes, and this is taken care of by the operating system. Windows is likely exactly the same in this respect. Without digging in the source code, I can't say with 100% authority what actually happens, but likely what happens is:
If delete
is False
, unlink()
(or a function similar to it on other operating systems) is called. This means that the file will automatically be deleted when the process exits and there are no more open file descriptors. While the process is running, the file will still remain around.
If delete
is True
, likely the C function remove()
is used. This will forcibly delete the file before the process exits.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With