Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are dynamically created classes always "unreachable" for gc in Python?

I have a question regarding garbage collection in Python. After reading some insightful articles on why one might prefer to run a Python program with disabled garbage collection*, I decided to search and remove all circular references in my code to allow objects to be destroyed through ref-counting alone.

For finding existing circular references, I put a call to gc.collect() in the tearDown method of my unittest cases and to print out a warning whenever a value >0 was returned. Most of the issues found were easy fixed by refactoring or the use of weak references.

After a while though, I came across a rather curious problem, best expressed in code:

import gc
gc.disable()

def bar():
    class Foo( object ):
        pass

bar()
print( gc.collect() ) # prints 6

When removing the call to bar(), gc.collect() returns 0, as expected.

It seems like even though Foo is created within the scope of the function bar and never returned to the outside, it sticks around and causes the garbage collector to find unreachable objects.

When moving Foo outside the scope of bar, everything works fine again. That solution is however not applicable to the problem I am trying to solve in the affected code (dynamic creation of ctypes.Structures for serialization).

The following two approaches did not work either:

import gc
gc.disable()

def bar():
    type( "Foo", ( object, ), {} )

bar()
print( gc.collect() ) # prints 6 again

or even the very 'clever':

import gc
gc.disable()

import weakref

def bar():
    weakref.ref( type( "Foo", ( object, ), {} ) )

bar()
print( gc.collect() ) # still prints 6

To top it off, here's an example that actually works ... but only in Python2:

import gc
gc.disable()

def bar():
    class Foo(): # not subclassing object
        pass

bar()
print( gc.collect() ) # prints 0 - finally?

The code above however, does again print out "6" in Python3 - I suspect, because all user defined classes are new-style classes in Python3.

So, am I stuck with Python2, weird "unreachable objects" in Python3 or do I have to follow up every call to bar with a manual garbage collection?

*(articles on running Python with gc.disable() )

http://pydev.blogspot.de/2014/03/should-python-garbage-collector-be.html http://dsvensson.wordpress.com/2010/07/23/the-garbage-garbage-collector-of-python/


See roippi's answer for why the above does behave as expected.

For future reference though, here's a small workaround that will fix this particular problem. Not saying that disabling gc is the right thing for anyone to do, but if you feel like it's the right thing for you, this is how I did it:

import gc
gc.disable()

def requiresGC( func ):
    def func_wrapper( *args, **kwargs ):
        result = func( *args, **kwargs )
        gc.collect()
        return result
    return func_wrapper

@requiresGC
def bar():
    class Foo( object ):
        pass

bar()
print( gc.collect() ) # prints 0

Note however, that this decorator will cause significant slowdown, if bar() is a function that is called regularly. In my case however (serialization), this is not the case and having the gc-overhead contained to a few specific functions seems a reasonable compromise.

Thanks to everyone who took the time to answer so quickly! :-)

like image 566
Clemens Sielaff Avatar asked Mar 20 '14 23:03

Clemens Sielaff


People also ask

How does GC work in Python?

Python garbage collection algorithm is very useful to open up space in the memory. Garbage collection is implemented in Python in two ways: reference counting and generational. When the reference count of an object reaches 0, reference counting garbage collection algorithm cleans up the object immediately.

Does Python garbage collect automatically?

Yes, Python garbage collector removes every object not referenced to. The feature is based on reference counting. However it can also deal with cyclic references. Of course when the process is terminated, all its resources are released.

Is Python GC stop the world?

Well, GC has its own drawbacks. First, it must run in the background which in CPython not really possible because of GIL, so GC is a stop-the-world process. And second, because GC happens in the background, the exact time frame for object releases is undetermined.

How often does Python garbage collector run?

Any time a reference count drops to zero, the object is immediately removed. 295 * deallocated immediately at that time. A full collection is triggered when the number of new objects is greater than 25% of the number of existing objects.


1 Answers

Declaring a new-style class - either statically or via type - creates a circular reference (actually, more than one). Here's the clearest example I can provide:

class Baz:
    pass

print(Baz in Baz.__mro__)
#True

There's a few other circular refs in Baz's __dict__ too, but one is all you need.

There's not really any workaround I can offer you - this is what the GC is there for, I'm afraid. I can point you to this bug report that's been around for a while if you'd like to dive in further.

like image 135
roippi Avatar answered Oct 02 '22 10:10

roippi