Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: exec statement and unexpected garbage collector behavior

I found a problem with exec (It happened in a system that has to be extensible with user written scripts). I could reduce the problem itself to this code:

def fn():
    context = {}
    exec '''
class test:
    def __init__(self):
        self.buf = '1'*1024*1024*200
x = test()''' in context

fn()

I expected that memory should be freed by the garbage collector after the call of function fn. However, the Python process still consumes the additional 200MB of memory and I have absolutely no clue what is happening here and how to release the allocated memory manually.

I suspect that defining a class inside exec is not a very bright idea, but, first of all, I want to understand what is going wrong in the example above.

It looks like wrapping class instance creation in another function solves the problem but what is the difference?

def fn():
    context = {}
    exec '''
class test:
    def __init__(self):
        self.buf = '1'*1024*1024*200
def f1(): x = test()
f1()
    ''' in context
fn()

This is my Python interpreter version:

$ python
Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) 
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
like image 686
3xter Avatar asked Jun 09 '11 17:06

3xter


1 Answers

The reason that you're seeing it take up 200Mb of memory for longer than you expect is because you have a reference cycle: context is a dict referencing both x and test. x references an instance of test, which references test. test has a dict of attributes, test.__dict__, which contains the __init__ function for the class. The __init__ function in turn references the globals that it was defined with -- which is the dict you passed to exec, context.

Python will break these reference cycles for you (since nothing involved has a __del__ method) but it requires gc.collect() to run. gc.collect() will run automatically every N allocations (determined by gc.set_threshold()) so the "leak" will go away at some point, but if you want it to go away immediately you can run gc.collect() yourself, or break the reference cycle yourself before exiting the function. You can easily do the latter by calling context.clear() -- but you should realize that that affects all instances of the class you created in it.

like image 170
Thomas Wouters Avatar answered Oct 08 '22 09:10

Thomas Wouters