I'm trying to determine if there are any reference counting memory leaks in a Python C extension module. Consider this very simple test extension that leaks a date
object:
#include <Python.h>
#include <datetime.h>
static PyObject* memleak(PyObject *self, PyObject *args) {
PyDate_FromDate(2000, 1, 1); /* deliberately create a memory leak */
Py_RETURN_NONE;
}
static PyMethodDef memleak_methods[] = {
{"memleak", memleak, METH_NOARGS, "Leak some memory"},
{NULL, NULL, 0, NULL} /* Sentinel */
};
PyMODINIT_FUNC initmemleak(void) {
PyDateTime_IMPORT;
Py_InitModule("memleak", memleak_methods);
}
PyDate_FromDate creates a new reference (i.e. internally calls Py_INCREF) and since I never call Py_DECREF, this object will never get garbage collected.
However, when I call this function, the number of objects being tracked by the garbage collector doesn't seem to change before and after the function call:
Python 2.7.3 (default, Apr 10 2013, 05:13:16)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from memleak import memleak
>>> import gc
>>> gc.disable()
>>> gc.collect()
0
>>> len(gc.get_objects()) # get object count before
3581
>>> memleak()
>>> gc.collect()
0
>>> len(gc.get_objects()) # get object count after
3581
And I can't seem to find the leaked date
object at all in the list of objects returned by gc.get_objects()
:
>>> from datetime import date
>>> print [obj for obj in gc.get_objects() if isinstance(obj, date)]
[]
Am I missing something here about how gc.get_objects()
works? Is there another way to demonstrate that the memleak() function has a memory leak?
The best approach to checking for the existence of a memory leak in your application is by looking at your RAM usage and investigating the total amount of memory been used versus the total amount available. Evidently, it is advisable to obtain snapshots of your memory's heap dump while in a production environment.
The Python program, just like other programming languages, experiences memory leaks. Memory leaks in Python happen if the garbage collector doesn't clean and eliminate the unreferenced or unused data from Python.
From the documentation of the gc
module:
Since the collector supplements the reference counting already used in Python, you can disable the collector if you are sure your program does not create reference cycles.
So the gc
module is used only to deal with references cycles. In your case there is no cycle, hence the date
object isn't returned by the get_objects
function.
In fact old versions of python did not have the garbage collector at all, they only used reference-counting. The garbage collector was introduced to avoid creating memory leaks with reference-cycles(since this can be done from the python side pretty easily, and you do not want that a pure-python programs create memory leaks).
To see that kind of memory leak you should call the memleak
function in a loop and see that the memory used increases (slowly in your case).
There are also some 3rd party libraries that can be used to profile memory usage, see the Which Python memory profiler is recommended? question on SO.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With