Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Garbage Collection sometimes not working in Jupyter Notebook

I'm constantly running out of RAM with some Jupyter Notebooks and I seem to be unable to release memory that is no longer needed. Here is an example:

import gc
thing = Thing()
result = thing.do_something(...)
thing = None
gc.collect()

As you can presume, thing uses a lot of memory to do something and then I don't need it anymore. I should be able to release the memory it uses. Even though it doesn't write to any variables that I can access from my notebook, garbage collector isn't freeing up space properly. The only workaround I've found is writing result into a pickle, restarting kernel, loading result from pickle, and continuing. This is really inconvenient when running long notebooks. How can I free up memory properly?

like image 792
Atte Juvonen Avatar asked Mar 27 '18 14:03

Atte Juvonen


2 Answers

There are a number of issues at play here. The first is that IPython (what Jupyter uses behind the scenes keeps additional references to objects when you see something like Out[67]. In fact you can use that syntax to recall the object and do something with it. eg. str(Out[67]). The second problem is that Jupyter seems to be keeping its own reference of output variables, so only a full reset of IPython will work. But that's not much different to just restarting the notebook.

There is a solution though! I wrote a function that you can run that will clear all variables, except the ones you explicitly ask to keep.

def my_reset(*varnames):
    """
    varnames are what you want to keep
    """
    globals_ = globals()
    to_save = {v: globals_[v] for v in varnames}
    to_save['my_reset'] = my_reset  # lets keep this function by default
    del globals_
    get_ipython().magic("reset")
    globals().update(to_save)

You would use it like:

x = 1
y = 2
my_reset('x')
assert 'y' not in globals()
assert x == 1

Below I wrote a notebook that shows you a little bit of what is going on behind the scenes and how you can see when something has truly been deleted by using the weakref module. You can try running it to see if it helps you understand what is going on.

In [1]: class MyObject:
            pass

In [2]: obj = MyObject()

In [3]: # now lets try deleting the object
        # First, create a weak reference to obj, so we can know when it is truly deleted.
        from weakref import ref
        from sys import getrefcount
        r = ref(obj)
        print("the weak reference looks like", r)
        print("it has a reference count of", getrefcount(r()))
        # this prints a ref count of 2 (1 for obj and 1 because getrefcount
        # had a reference to obj)
        del obj
        # since obj was the only strong reference to the object, it should have been 
        # garbage collected now.
        print("the weak reference looks like", r)

the weak reference looks like <weakref at 0x7f29a809d638; to 'MyObject' at 0x7f29a810cf60>
it has a reference count of 2
the weak reference looks like <weakref at 0x7f29a809d638; dead>

In [4]: # lets try again, but this time we won't print obj, will just do "obj"
        obj = MyObject()

In [5]: print(getrefcount(obj))
        obj

2
Out[5]: <__main__.MyObject at 0x7f29a80a0c18>

In [6]: # note the "Out[5]". This is a second reference to our object
        # and will keep it alive if we delete obj
        r = ref(obj)
        del obj
        print("the weak reference looks like", r)
        print("with a reference count of:", getrefcount(r()))

the weak reference looks like <weakref at 0x7f29a809db88; to 'MyObject' at 0x7f29a80a0c18>
with a reference count of: 7

In [7]: # So what happened? It's that Out[5] that is keeping the object alive.
        # if we clear our Out variables it should go away...
        # As it turns out Juypter keeps a number of its own variables lying around, 
        # so we have to reset pretty everything.

In [8]: def my_reset(*varnames):
            """
            varnames are what you want to keep
            """
            globals_ = globals()
            to_save = {v: globals_[v] for v in varnames}
            to_save['my_reset'] = my_reset  # lets keep this function by default
            del globals_
            get_ipython().magic("reset")
            globals().update(to_save)

        my_reset('r') # clear everything except our weak reference to the object
        # you would use this to keep "thing" around.

Once deleted, variables cannot be recovered. Proceed (y/[n])? y

In [9]: print("the weak reference looks like", r)

the weak reference looks like <weakref at 0x7f29a809db88; dead>
like image 98
Dunes Avatar answered Nov 18 '22 11:11

Dunes


I had the same issue, and after many hours of struggle, the solution that worked for me was very lean. You just need to include all your code into a single cell. In the same cell, garbage collection is performed normally, and only after you leave the cell is when the variables have all the extra references and are not collectible.

For long notebooks, this might be a highly inconvenient and non-readable way, however, the idea is that you can perform garbage collection in a cell for the variables in that cell. So maybe you could organize your code in a way that you can call gc.collect() at the end of the cell before leaving it.

Hope this helps :)

like image 4
Aneta Baloyan Avatar answered Nov 18 '22 11:11

Aneta Baloyan