Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

VS code with Jupyter notebook is extremely slow when re-running cells

I’ve got trouble with Jupyter in vscode. On vs code start-up, my 10 cells “run all” in about 3-4 minutes in a perfectly sensible way (loading the data and heavy calculations take time, but plotting is fast, etc). But re-running cells sometimes takes forever.

As an example, the last cell, which is just plotting the data, took 1.6s on 'run all'. When I instantly re-run that last cell (even without changing the code at all), it takes much longer to run (up to 14 minutes!). Why is that?

It isn’t a CPU or memory problem because those climb to only 30% and 75% during the runs, respectively.

I’ve seen the same error on a Jupyter notebook server. The solution was to disable “Variable Inspector nbextension”. Is there an equivalent solution when running jupyter notebooks on vs code?

UPDATE: It's only the case when the "Jupyter: Variables" pane is open. A lot of variables are shown as loading endlessly. The problem might be that the variables are too big to show?

Versions: vs code 1.73.1 / Jupyter v2022.9.1303220346 / Python 3.10.8 / Windows 11

Note: I have pandas variables, just like in the problem in the above link.

like image 691
The Authors Avatar asked Nov 18 '25 11:11

The Authors


1 Answers

Possible Cause

Without a specific code example, I can't be sure what is causing your problem.

However, I can tell you that a common cause of this problem is the "Jupyter: Variables" window. When you have this window open in VS Code, Spyder, or other Jupyter based IDE, the IDE will run a piece of Python code to get a "representation" of your code to put in the variables window.

This is done by the IDE by calling the Python function repr() doing something like this:

# representation would be a string
representation = repr(your_python_object)
print_to_variable_window(representation)

If your code takes a long time to format and generate the representation, this can cause it to be rather quick to run your code, but re-running your code takes a long time due to Jupyter waiting until it has updated the variable window before actually running your code.

Reproducing the Issue

If you define a Python object who's __repr__() method takes a long time to run, that could cause the "Jupyter: Variable" window to take a long time to load. This could be caused by a really long string, or list, or other otherwise the system tries to create a representation of a lot data. Or, (mainly for demo purposes), there could by a sleep() call within the repr, such as shown below:

import time

class Hello:
    def __init__(self):
        self.count = 0

    def __repr__(self):
        self.count += 1
        time.sleep(10)
        return f"Hello. Repr has been called {self.count} times"

# %% Next Jupyter Cell

hello = Hello()
print("Hi")

When I run this in VS Code with the "Jupyter: Variable" window open, this code runs very quickly the first time, and I see "Hello. Repr has been called 1 times" in the variable window after about 50 seconds.

If I try to run either cell again, it takes noticeably long to run, and I eventually see a "Hello. Repr has been called 5 times" in the "Jupyter: Variable" window.

Possible Solutions

  1. Close the "Jupyter: Variable" window. This is an issue specific to the way "Jupyter: Variable" works, and by not using it you should not have this problem when the "Jupyter: Variable" is closed.

  2. Restart the kernel. While restarting the kernel means you can not run individual cells in your Python script, restarting the kernel will all you to then do "run all", and will prevent the computer from spending a lot of time trying to generate the variable previous.

  3. Identify the variable in your Python environment that is causing your code to run so slowly. Likely, there is a variable in your Python environment that is taking a long time to create its representation, and that is the direct cause of the issue.

    This could be a simple has having a huge string such as:

    # Warning! This will create a string that consumes ~750 MB of RAM
    x = ["abcdefg" * 10_000_000] * 10_000_000
    

    To find if there is a particular offending variable, you could run a little script like this in a Jupyter cell.

    for key, value in globals().copy().items():
        # Ignore special system objects
        if key.startswith("_"): continue
        print(key)
        print(repr(value))
    

    The script will hang and take along time after printing the name of the variable that is causing the issue.

  4. Use reprlib

    You can use the reprlib package to create safe __repr__() calls that will not cause this problem.

    For example, instead of creating a big variable x that will not play nicely, you can create a wrapper around it like this:

    from reprlib import repr as safe_repr
    
    class MyStrHolder:
        def __init__(self, x):
            self.x = x
    
        def __repr__(self):
            return safe_repr(x)
    

    This will allow you to have a large string that will not slow down the Jupyter interpreter.

    Note that many libraries such as Numpy and Pandas already come with custom repr's to prevent those classes from causing this issue.

like image 144
The Matt Avatar answered Nov 20 '25 01:11

The Matt