I came across an interesting discovery related to how SWIG handles reference counting of C structures that contain other structures as members.
I observed that my python SWIG objects were getting garbage collected before I was done using them in situations where I was storing data from structure sub-members into other python objects ( lists/dicts ). After a fair bit of digging I discovered that SWIG-ed structure members did not seem to have their own independent reference counts even though the interpreter indicates they are "Swig Objects". Therefore when I added the data from the structure sub-element to my list python had no knowledge that I had added a reference to this data.
I have created a simple case to demonstrate. I SWIG-ed the following 3 structures:
SWIG-ed C Structures:
typedef struct
{
unsigned long source;
unsigned long destination;
} message_header;
typedef struct
{
unsigned long data[120];
} message_large_body;
typedef struct
{
message_header header;
message_large_body body;
} large_message;
I then created a somewhat equivalent python class to compare the behavior to the purely SWIG-ed solution.
Somewhat Equivalent Python Class
class pyLargeMessage(object):
def __init__(self):
self.header = bar.message_header()
self.body = bar.message_large_body()
I then ran the following test in the interpreter.
Python Interpreter Results
>>> y = pyLargeMessage()
>>> y
<__main__.pyLargeMessage object at 0x06C5E6B0>
>>> y.header
<Swig Object of type 'message_header *' at 0x06C5E700>
>>> sys.getrefcount(y.header)
3
>>> z = [y.header]
>>> sys.getrefcount(y.header)
3
>>> z += [y.header]
>>> sys.getrefcount(y.header)
4
>>>
>>> y = bar.large_message()
>>> y
<Swig Object of type 'large_message *' at 0x06C668E0>
>>> y.header
<Swig Object of type 'message_header *' at 0x06C66B60>
>>> sys.getrefcount(y.header)
1
>>> z = [y.header]
>>> sys.getrefcount(y.header)
1
>>> z += [y.header]
>>> sys.getrefcount(y.header)
1
>>>
The Python implementation behaved as I expected, but the pure SWIG implementation did not. Can someone explain what is going on here?
I've read through various sections of the SWIG documentation many many times and can't find anything that directly seems to explain this. I've learned a lot more about how things work, but i can't find any clear explanation/workaround for the phenomenon above.
After thinking about it for a long while, re-reading the Structures and Classes, Proxy classes and Structure Data Members sections over and over and looking at the generated wrapper code I still can't figure out why the reference counts aren't handled normally.
The generated C code calls SWIG_NewPointerObj
, which eventually ( in most cases ) calls PyObject_New
, which in turn should ( as the python documentation says ) return a new reference.
Generated SWIG code for the get-er for the header member
SWIGINTERN PyObject *_wrap_large_message_header_get(PyObject *self, PyObject *args) {
PyObject *resultobj = 0;
large_message *arg1 = (large_message *) 0 ;
void *argp1 = 0 ;
int res1 = 0 ;
message_header *result = 0 ;
if (args && PyTuple_Check(args) && PyTuple_GET_SIZE(args) > 0) SWIG_fail;
res1 = SWIG_ConvertPtr(self, &argp1,SWIGTYPE_p_large_message, 0 | 0 );
if (!SWIG_IsOK(res1)) {
SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "large_message_header_get" "', argument " "1"" of type '" "large_message *""'");
}
arg1 = (large_message *)(argp1);
result = (message_header *)& ((arg1)->header);
resultobj = SWIG_NewPointerObj(SWIG_as_voidptr(result), SWIGTYPE_p_message_header, 0 | 0 );
return resultobj;
fail:
return NULL;
}
As has been pointed out the object returned by the getter for header
and body
is basically a lightweight proxy object that holds a pointer to memory for the header
/body
inside the struct
. It doesn't own that memory (it's still "owned" by the message
object itself, or the C library depending on how you've created it) and it's not a copy.
Even if it were a copy your call to sys.getrefcount
would always return 1 still anyway - each call to the getter would be returning a new copy.
From the Python perspective if you want to ensure that you never have a dangling pointer there are two ways it could be fixed:
header
/body
which owns the memory it points to.message
itself, so that even if the message
is released it's refcount can't hit 0 whilst there are proxy objects referring to parts of it.I've put together an example of doing #2 with SWIG. Your header file remains unchanged, but the interface becomes:
%module test
%{
#include "test.h"
%}
%typemap(out) message_header * header %{
// This expands to resultobj = SWIG_NewPointerObj(...) exactly as before:
$result = SWIG_NewPointerObj(SWIG_as_voidptr($1), $1_descriptor, 0);
// This sets a reference to the parent object inside the child
PyObject_SetAttrString($result, "_parent", obj0);
%}
%include "test.h"
This is equivalent to saying:
z = y.header
z._parent = y
in Python.
With this in place we can now run:
y = test.large_message()
print(sys.getrefcount(y))
print(y.header)
z = [y.header]
print(sys.getrefcount(y))
z += [y.header]
print(sys.getrefcount(y))
Which as expected shows the reference count for y
increasing with every sub-object proxy that is created. Thus the memory that they refer to can't be free'd prematurely (at least not by SWIG).
You can make that more generic and apply it to multiple types/members using %apply
:
%module test
%{
#include "test.h"
%}
%typemap(out) SWIGTYPE * SUBOBJECT %{
$result = SWIG_NewPointerObj(SWIG_as_voidptr($1), $1_descriptor, 0);
PyObject_SetAttrString($result, "_parent", obj0);
assert(obj0);
// hello world
%}
%apply SWIGTYPE * SUBOBJECT { message_header * header };
%apply SWIGTYPE * SUBOBJECT { message_large_body * body };
%include "test.h"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With