I'm constantly running out of RAM with some Jupyter Notebooks and I seem to be unable to release memory that is no longer needed. Here is an example:
import gc
thing = Thing()
result = thing.do_something(...)
thing = None
gc.collect()
As you can presume, thing
uses a lot of memory to do something and then I don't need it anymore. I should be able to release the memory it uses. Even though it doesn't write to any variables that I can access from my notebook, garbage collector isn't freeing up space properly. The only workaround I've found is writing result
into a pickle, restarting kernel, loading result
from pickle, and continuing. This is really inconvenient when running long notebooks. How can I free up memory properly?
There are a number of issues at play here. The first is that IPython (what Jupyter uses behind the scenes keeps additional references to objects when you see something like Out[67]
. In fact you can use that syntax to recall the object and do something with it. eg. str(Out[67])
. The second problem is that Jupyter seems to be keeping its own reference of output variables, so only a full reset of IPython will work. But that's not much different to just restarting the notebook.
There is a solution though! I wrote a function that you can run that will clear all variables, except the ones you explicitly ask to keep.
def my_reset(*varnames):
"""
varnames are what you want to keep
"""
globals_ = globals()
to_save = {v: globals_[v] for v in varnames}
to_save['my_reset'] = my_reset # lets keep this function by default
del globals_
get_ipython().magic("reset")
globals().update(to_save)
You would use it like:
x = 1
y = 2
my_reset('x')
assert 'y' not in globals()
assert x == 1
Below I wrote a notebook that shows you a little bit of what is going on behind the scenes and how you can see when something has truly been deleted by using the weakref
module. You can try running it to see if it helps you understand what is going on.
In [1]: class MyObject:
pass
In [2]: obj = MyObject()
In [3]: # now lets try deleting the object
# First, create a weak reference to obj, so we can know when it is truly deleted.
from weakref import ref
from sys import getrefcount
r = ref(obj)
print("the weak reference looks like", r)
print("it has a reference count of", getrefcount(r()))
# this prints a ref count of 2 (1 for obj and 1 because getrefcount
# had a reference to obj)
del obj
# since obj was the only strong reference to the object, it should have been
# garbage collected now.
print("the weak reference looks like", r)
the weak reference looks like <weakref at 0x7f29a809d638; to 'MyObject' at 0x7f29a810cf60>
it has a reference count of 2
the weak reference looks like <weakref at 0x7f29a809d638; dead>
In [4]: # lets try again, but this time we won't print obj, will just do "obj"
obj = MyObject()
In [5]: print(getrefcount(obj))
obj
2
Out[5]: <__main__.MyObject at 0x7f29a80a0c18>
In [6]: # note the "Out[5]". This is a second reference to our object
# and will keep it alive if we delete obj
r = ref(obj)
del obj
print("the weak reference looks like", r)
print("with a reference count of:", getrefcount(r()))
the weak reference looks like <weakref at 0x7f29a809db88; to 'MyObject' at 0x7f29a80a0c18>
with a reference count of: 7
In [7]: # So what happened? It's that Out[5] that is keeping the object alive.
# if we clear our Out variables it should go away...
# As it turns out Juypter keeps a number of its own variables lying around,
# so we have to reset pretty everything.
In [8]: def my_reset(*varnames):
"""
varnames are what you want to keep
"""
globals_ = globals()
to_save = {v: globals_[v] for v in varnames}
to_save['my_reset'] = my_reset # lets keep this function by default
del globals_
get_ipython().magic("reset")
globals().update(to_save)
my_reset('r') # clear everything except our weak reference to the object
# you would use this to keep "thing" around.
Once deleted, variables cannot be recovered. Proceed (y/[n])? y
In [9]: print("the weak reference looks like", r)
the weak reference looks like <weakref at 0x7f29a809db88; dead>
I had the same issue, and after many hours of struggle, the solution that worked for me was very lean. You just need to include all your code into a single cell. In the same cell, garbage collection is performed normally, and only after you leave the cell is when the variables have all the extra references and are not collectible.
For long notebooks, this might be a highly inconvenient and non-readable way, however, the idea is that you can perform garbage collection in a cell for the variables in that cell. So maybe you could organize your code in a way that you can call gc.collect()
at the end of the cell before leaving it.
Hope this helps :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With