Two sections of Python 2.7's doc mentioned adding cyclic garbage collection (CGC) support for container objects defined in extension modules.
The Python/C API Reference Manual gives two rules, i.e.,
- The memory for the object must be allocated using
PyObject_GC_New()
orPyObject_GC_NewVar()
.- Once all the fields which may contain references to other containers are initialized, it must call
PyObject_GC_Track()
.
Whereas in Extending and Embedding the Python Interpreter, for the Noddy
example, it seems that adding the Py_TPFLAGS_HAVE_GC
flag and filling tp_traverse
and tp_clear
slots would be sufficient to enable CGC support. And the two rules above are NOT practiced at all.
When I modified the Noddy
example to actually follow the rules of PyObject_GC_New()
/PyObject_GC_Del()
and PyObject_Track()
/PyObject_GC_UnTrack()
, it surprisingly raised assertion error saying,
Modules/gcmodule.c:348: visit_decref: Assertion "gc->gc.gc_refs != 0" failed. refcount was too small
This leads to my confusion about the correct / safe way to implement CGC. Could anyone give advices or, preferably, a neat example of a container object with CGC support?
Under most normal circumstances you shouldn't need to do do the tracking/untracking yourself. This is described in the documentation, however it isn't made specifically clear. In the case of the Noddy
example you definitely don't.
The short version is that a TypeObject contains two function pointers: tp_alloc
and tp_free
. By default tp_alloc
calls all the right functions on creation of a class (if Py_TPFLAGS_HAVE_GC
is set) and tp_free
untracks the class on destruction.
The Noddy documentation says (at the end of the section):
That’s pretty much it. If we had written custom
tp_alloc
ortp_free
slots, we’d need to modify them for cyclic-garbage collection. Most extensions will use the versions automatically provided.
Unfortunately, the one place that doesn't make it clear that you don't need to do this yourself is the Supporting Cyclic Garbage Collection documentation.
Detail:
Noddy's are allocated using a function called Noddy_new
put in the tp_new
slots of the TypeObject
. According to the documentation, the main thing the "new" function should do is call the tp_alloc
slot. You typically don't write tp_alloc
yourself, and it just defaults to PyType_GenericAlloc()
.
Looking at PyType_GenericAlloc()
in the Python source shows a number of sections where it changes based on PyType_IS_GC(type)
. First it calls _PyObject_GC_Malloc
instead of PyObject_Malloc
, and second it calls _PyObject_GC_TRACK(obj)
. [Note that all that PyObject_New
really does is call PyObject_Malloc
and then tp_init
.]
Similarly, on deallocation you call the tp_free
slot, which is automatically set to PyObject_GC_Del
for classes with Py_TPFLAGS_HAVE_GC
. PyObject_GC_Del
includes code that does the same as PyObject_GC_UnTrack
so a call to untrack is unnecessary.
I am not experienced enough in the C API myself to give you any advice, but there are plenty of examples in the Python container implementations themselves.
Personally, I'd start with the tuple implementation first, since it's immutable: Objects/tupleobject.c. Then move on to the dict
, list
and set
implementations for further notes on mutable containers:
I can't help but notice that there are calls to PyObject_GC_New()
, PyObject_GC_NewVar()
and PyObject_GC_Track()
throughout, as well as having Py_TPFLAGS_HAVE_GC
set.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With