I am training PyTorch deep learning models on a Jupyter-Lab notebook, using CUDA on a Tesla K80 GPU to train. While doing training iterations, the 12 GB of GPU memory are used. I finish training by saving the model checkpoint, but want to continue using the notebook for further analysis (analyze intermediate results, etc.). However, these 12 GB continue being occupied (as seen from <code>nvtop</code>) after finishing training. I would like to free up this memory so that I can use it for other notebooks. My solution so far is to restart this notebook's kernel, but that is not solving my issue because I can't continue using the same notebook and its respective output computed so far.

The answers so far are correct for the Cuda side of things, but there's also an issue on the ipython side of things. When you have an error in a notebook environment, the ipython shell stores the traceback of the exception so you can access the error state with <code>%debug</code>. The issue is that this requires holding all variables that caused the error to be held in memory, and they aren't reclaimed by methods like <code>gc.collect()</code>. Basically all your variables get stuck and the memory is leaked. Usually, causing a new exception will free up the state of the old exception. So trying something like <code>1/0</code> may help. However things can get weird with Cuda variables and sometimes there's no way to clear your GPU memory without restarting the kernel. For more detail see these references: https://github.com/ipython/ipython/pull/11572 How to save traceback / sys.exc_info() values in a variable?

If you just set object that uses a lot of memory to <code>None</code> like this: <pre class="prettyprint"><code>obj = None </code></pre> And after that you call <pre class="prettyprint"><code>gc.collect() # Python thing </code></pre> This is how you may avoid restarting the notebook. <hr> If you still would like to see it clear from Nvidea smi or nvtop you may run: <pre class="prettyprint"><code>torch.cuda.empty_cache() # PyTorch thing </code></pre> to empty the PyTorch cache.

<pre class="prettyprint"><code>with pytorch.no_grad(): torch.cuda.empty_cache() </code></pre>

How to clear GPU memory after PyTorch model training without restarting kernel

Tags:

python

jupyter

pytorch

I am training PyTorch deep learning models on a Jupyter-Lab notebook, using CUDA on a Tesla K80 GPU to train. While doing training iterations, the 12 GB of GPU memory are used. I finish training by saving the model checkpoint, but want to continue using the notebook for further analysis (analyze intermediate results, etc.).

However, these 12 GB continue being occupied (as seen from nvtop) after finishing training. I would like to free up this memory so that I can use it for other notebooks.

My solution so far is to restart this notebook's kernel, but that is not solving my issue because I can't continue using the same notebook and its respective output computed so far.

973

asked Sep 09 '19 17:09

Glyph

3 Answers

The answers so far are correct for the Cuda side of things, but there's also an issue on the ipython side of things.

When you have an error in a notebook environment, the ipython shell stores the traceback of the exception so you can access the error state with %debug. The issue is that this requires holding all variables that caused the error to be held in memory, and they aren't reclaimed by methods like gc.collect(). Basically all your variables get stuck and the memory is leaked.

Usually, causing a new exception will free up the state of the old exception. So trying something like 1/0 may help. However things can get weird with Cuda variables and sometimes there's no way to clear your GPU memory without restarting the kernel.

For more detail see these references:

https://github.com/ipython/ipython/pull/11572

How to save traceback / sys.exc_info() values in a variable?

132

answered Oct 16 '22 08:10

Karl

If you just set object that uses a lot of memory to None like this:

obj = None

And after that you call

gc.collect() # Python thing

This is how you may avoid restarting the notebook.

If you still would like to see it clear from Nvidea smi or nvtop you may run:

torch.cuda.empty_cache() # PyTorch thing

to empty the PyTorch cache.

answered Oct 16 '22 10:10

prosti

with pytorch.no_grad():
    torch.cuda.empty_cache()

answered Oct 16 '22 09:10

Maunish Dave

Related questions
                            
                                how to rotate turtle shape in python
                            
                                How to unify Python Pyramid views for handling Ajax/html form POSTs
                            
                                Use Python on MAMP
                            
                                windows chrome refresh tab 0(or current tab) via command line
                            
                                Default file type in tkFileDialog's askopenfilename method
                            
                                Run python program from another python program (with certain requirements)
                            
                                How to determine which compiler was requested
                            
                                Splitting the legend in matploblib
                            
                                How can you suppress traces for failed test cases using Nose?
                            
                                Sending and receiving structured array data in MPI4Py using NumPy
                            
                                Python subprocess with stdout redirect returning an int
                            
                                manipulating linewidth for hatching in matplotlib
                            
                                python crypto rsa issue
                            
                                conditional operation on numpy multidimensional array
                            
                                How can I send HTML email via Celery? It keeps sending in text/plain
                            
                                Spaces in directory path python
                            
                                Sklearn Transformers: How to apply encoder to multiple columns and reuse it in production?
                            
                                Convert an excel or spreadsheet column letter to its number in Pythonic fashion
                            
                                Why does a class' body get executed at definition time?
                            
                                Assigning to an instance's __class__ attribute in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With