I've been scripting something that has to do with scanning directories and noticed a severe memory leak when calling os.path.isdir, so I've tried the following snippet:
def func():
if not os.path.isdir('D:\Downloads'):
return False
while True:
func()
Within a few seconds, the Python process reached 100MB RAM.
I'm trying to figure out what's going on. It seems like the huge memory leak is in effect only when the path is indeed a valid directory path (meaning the 'return False' is not executed). Also, it is interesting to see what happens in related calls, like os.path.isfile.
Thoughts?
Edit: I think I'm onto something. Although isfile and isdir are implemented in the genericpath module, on Windows system - isdir is being imported from the builtin nt. So I had to download the 2.7.3 source (which I should've done long time ago...).
After a little bit of searching, I found out posix__isdir function in \Modules\posixmodule.c, which I assume is the 'isdir' function imported from nt.
This part of the function (and comment) caught my eye:
if (PyArg_ParseTuple(args, "U|:_isdir", &po)) {
Py_UNICODE *wpath = PyUnicode_AS_UNICODE(po);
attributes = GetFileAttributesW(wpath);
if (attributes == INVALID_FILE_ATTRIBUTES)
Py_RETURN_FALSE;
goto check;
}
/* Drop the argument parsing error as narrow strings
are also valid. */
PyErr_Clear();
It seems that it all boils down to Unicode/ASCII handling bug.
I've just tried my snippet above with path argument in unicode (i.e. u'D:\Downloads') - no memory leak whatsoever. haha.
The root cause is a failure to call PyMem_Free
on the path
variable in the non-Unicode path:
if (!PyArg_ParseTuple(args, "et:_isdir",
Py_FileSystemDefaultEncoding, &path))
return NULL;
attributes = GetFileAttributesA(path);
if (attributes == INVALID_FILE_ATTRIBUTES)
Py_RETURN_FALSE;
check:
if (attributes & FILE_ATTRIBUTE_DIRECTORY)
Py_RETURN_TRUE;
else
Py_RETURN_FALSE;
As per the documentation on PyArg_ParseTuple
:
et
: Same ases
...es
:PyArg_ParseTuple()
will allocate a buffer of the needed size, copy the encoded data into this buffer and adjust *buffer to reference the newly allocated storage. The caller is responsible for callingPyMem_Free()
to free the allocated buffer after use.
It's a bug in Python's standard library (fixed in Python 3 by using bytes objects directly); file a bug report at http://bugs.python.org.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With