Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Memory leak debugging

People also ask

Does valgrind work with Python?

Valgrind is used periodically by Python developers to try to ensure there are no memory leaks or invalid memory reads/writes. If you want to use Valgrind more effectively and catch even more memory leaks, you will need to configure python --without-pymalloc.

How do you run a memory profiling in Python?

The easiest way to profile a single method or function is the open source memory-profiler package. It's similar to line_profiler , which I've written about before . You can use it by putting the @profile decorator around any function or method and running python -m memory_profiler myscript.

How do I use Tracemalloc in Python?

To trace most memory blocks allocated by Python, the module should be started as early as possible by setting the PYTHONTRACEMALLOC environment variable to 1 , or by using -X tracemalloc command line option. The tracemalloc. start() function can be called at runtime to start tracing Python memory allocations.


See http://opensourcehacker.com/2008/03/07/debugging-django-memory-leak-with-trackrefs-and-guppy/ . Short answer: if you're running django but not in a web-request-based format, you need to manually run db.reset_queries() (and of course have DEBUG=False, as others have mentioned). Django automatically does reset_queries() after a web request, but in your format, that never happens.


Is DEBUG=False in settings.py?

If not Django will happily store all the SQL queries you make which adds up.


Have you tried gc.set_debug() ?

You need to ask yourself simple questions:

  • Am I using objects with __del__ methods? Do I absolutely, unequivocally, need them?
  • Can I get reference cycles in my code? Can't we break these circles before getting rid of the objects?

See, the main issue would be a cycle of objects containing __del__ methods:

import gc

class A(object):
    def __del__(self):
        print 'a deleted'
        if hasattr(self, 'b'):
            delattr(self, 'b')

class B(object):
    def __init__(self, a):
        self.a = a
    def __del__(self):
        print 'b deleted'
        del self.a


def createcycle():
    a = A()
    b = B(a)
    a.b = b
    return a, b

gc.set_debug(gc.DEBUG_LEAK)

a, b = createcycle()

# remove references
del a, b

# prints:
## gc: uncollectable <A 0x...>
## gc: uncollectable <B 0x...>
## gc: uncollectable <dict 0x...>
## gc: uncollectable <dict 0x...>
gc.collect()

# to solve this we break explicitely the cycles:
a, b = createcycle()
del a.b

del a, b

# objects are removed correctly:
## a deleted
## b deleted
gc.collect()

I would really encourage you to flag objects / concepts that are cycling in your application and focus on their lifetime: when you don't need them anymore, do we have anything referencing it?

Even for cycles without __del__ methods, we can have an issue:

import gc

# class without destructor
class A(object): pass

def createcycle():
    # a -> b -> c 
    # ^         |
    # ^<--<--<--|
    a = A()
    b = A()
    a.next = b
    c = A()
    b.next = c
    c.next = a
    return a, b, b

gc.set_debug(gc.DEBUG_LEAK)

a, b, c = createcycle()
# since we have no __del__ methods, gc is able to collect the cycle:

del a, b, c
# no panic message, everything is collectable:
##gc: collectable <A 0x...>
##gc: collectable <A 0x...>
##gc: collectable <dict 0x...>
##gc: collectable <A 0x...>
##gc: collectable <dict 0x...>
##gc: collectable <dict 0x...>
gc.collect()

a, b, c = createcycle()

# but as long as we keep an exterior ref to the cycle...:
seen = dict()
seen[a] = True

# delete the cycle
del a, b, c
# nothing is collected
gc.collect()

If you have to use "seen"-like dictionaries, or history, be careful that you keep only the actual data you need, and no external references to it.

I'm a bit disappointed now by set_debug, I wish it could be configured to output data somewhere else than to stderr, but hopefully that should change soon.


See this excellent blog post from Ned Batchelder on how they traced down real memory leak in HP's Tabblo. A classic and worth reading.


I think you should use different tools. Apparently, the statistics you got is only about GC objects (i.e. objects which may participate in cycles); most notably, it lacks strings.

I recommend to use Pympler; this should provide you with more detailed statistics.


Do you use any extension? They are a wonderful place for memory leaks, and will not be tracked by python tools.