Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Huge memory leak in repeated os.path.isdir calls?

I've been scripting something that has to do with scanning directories and noticed a severe memory leak when calling os.path.isdir, so I've tried the following snippet:

def func():
    if not os.path.isdir('D:\Downloads'):
        return False
while True:
    func()

Within a few seconds, the Python process reached 100MB RAM.

I'm trying to figure out what's going on. It seems like the huge memory leak is in effect only when the path is indeed a valid directory path (meaning the 'return False' is not executed). Also, it is interesting to see what happens in related calls, like os.path.isfile.

Thoughts?

Edit: I think I'm onto something. Although isfile and isdir are implemented in the genericpath module, on Windows system - isdir is being imported from the builtin nt. So I had to download the 2.7.3 source (which I should've done long time ago...).

After a little bit of searching, I found out posix__isdir function in \Modules\posixmodule.c, which I assume is the 'isdir' function imported from nt.

This part of the function (and comment) caught my eye:

if (PyArg_ParseTuple(args, "U|:_isdir", &po)) {
        Py_UNICODE *wpath = PyUnicode_AS_UNICODE(po);

        attributes = GetFileAttributesW(wpath);
        if (attributes == INVALID_FILE_ATTRIBUTES)
            Py_RETURN_FALSE;
        goto check;
    }
    /* Drop the argument parsing error as narrow strings
       are also valid. */
    PyErr_Clear();

It seems that it all boils down to Unicode/ASCII handling bug.

I've just tried my snippet above with path argument in unicode (i.e. u'D:\Downloads') - no memory leak whatsoever. haha.

like image 608
AAlon Avatar asked Sep 28 '12 23:09

AAlon


1 Answers

The root cause is a failure to call PyMem_Free on the path variable in the non-Unicode path:

    if (!PyArg_ParseTuple(args, "et:_isdir",
                          Py_FileSystemDefaultEncoding, &path))
        return NULL;

    attributes = GetFileAttributesA(path);
    if (attributes == INVALID_FILE_ATTRIBUTES)
        Py_RETURN_FALSE;

check:
    if (attributes & FILE_ATTRIBUTE_DIRECTORY)
        Py_RETURN_TRUE;
    else
        Py_RETURN_FALSE;

As per the documentation on PyArg_ParseTuple:

  • et: Same as es...
  • es: PyArg_ParseTuple() will allocate a buffer of the needed size, copy the encoded data into this buffer and adjust *buffer to reference the newly allocated storage. The caller is responsible for calling PyMem_Free() to free the allocated buffer after use.

It's a bug in Python's standard library (fixed in Python 3 by using bytes objects directly); file a bug report at http://bugs.python.org.

like image 60
nneonneo Avatar answered Oct 19 '22 00:10

nneonneo