Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correct cyclic garbage-collection in extension modules

Two sections of Python 2.7's doc mentioned adding cyclic garbage collection (CGC) support for container objects defined in extension modules.

The Python/C API Reference Manual gives two rules, i.e.,

  1. The memory for the object must be allocated using PyObject_GC_New() or PyObject_GC_NewVar().
  2. Once all the fields which may contain references to other containers are initialized, it must call PyObject_GC_Track().

Whereas in Extending and Embedding the Python Interpreter, for the Noddy example, it seems that adding the Py_TPFLAGS_HAVE_GC flag and filling tp_traverse and tp_clear slots would be sufficient to enable CGC support. And the two rules above are NOT practiced at all.

When I modified the Noddy example to actually follow the rules of PyObject_GC_New()/PyObject_GC_Del() and PyObject_Track()/PyObject_GC_UnTrack(), it surprisingly raised assertion error saying,

Modules/gcmodule.c:348: visit_decref: Assertion "gc->gc.gc_refs != 0" failed. refcount was too small

This leads to my confusion about the correct / safe way to implement CGC. Could anyone give advices or, preferably, a neat example of a container object with CGC support?

like image 531
liuyu Avatar asked Sep 04 '12 00:09

liuyu


2 Answers

Under most normal circumstances you shouldn't need to do do the tracking/untracking yourself. This is described in the documentation, however it isn't made specifically clear. In the case of the Noddy example you definitely don't.

The short version is that a TypeObject contains two function pointers: tp_alloc and tp_free. By default tp_alloc calls all the right functions on creation of a class (if Py_TPFLAGS_HAVE_GC is set) and tp_free untracks the class on destruction.

The Noddy documentation says (at the end of the section):

That’s pretty much it. If we had written custom tp_alloc or tp_free slots, we’d need to modify them for cyclic-garbage collection. Most extensions will use the versions automatically provided.

Unfortunately, the one place that doesn't make it clear that you don't need to do this yourself is the Supporting Cyclic Garbage Collection documentation.


Detail:

Noddy's are allocated using a function called Noddy_new put in the tp_new slots of the TypeObject. According to the documentation, the main thing the "new" function should do is call the tp_alloc slot. You typically don't write tp_alloc yourself, and it just defaults to PyType_GenericAlloc().

Looking at PyType_GenericAlloc() in the Python source shows a number of sections where it changes based on PyType_IS_GC(type). First it calls _PyObject_GC_Malloc instead of PyObject_Malloc, and second it calls _PyObject_GC_TRACK(obj). [Note that all that PyObject_New really does is call PyObject_Malloc and then tp_init.]

Similarly, on deallocation you call the tp_free slot, which is automatically set to PyObject_GC_Del for classes with Py_TPFLAGS_HAVE_GC. PyObject_GC_Del includes code that does the same as PyObject_GC_UnTrack so a call to untrack is unnecessary.

like image 149
DavidW Avatar answered Sep 22 '22 21:09

DavidW


I am not experienced enough in the C API myself to give you any advice, but there are plenty of examples in the Python container implementations themselves.

Personally, I'd start with the tuple implementation first, since it's immutable: Objects/tupleobject.c. Then move on to the dict, list and set implementations for further notes on mutable containers:

  • Objects/dictobject.c
  • Objects/listobject.c
  • Objects/setobject.c

I can't help but notice that there are calls to PyObject_GC_New(), PyObject_GC_NewVar() and PyObject_GC_Track() throughout, as well as having Py_TPFLAGS_HAVE_GC set.

like image 33
Martijn Pieters Avatar answered Sep 26 '22 21:09

Martijn Pieters