Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Synchronizing code between jupyter/iPython notebook script and class methods

I'm trying to figure out the best way to keep code in an Jupyter/iPython notebook and the same code inside of a class method in sync. Here's the use case:

I wrote a long script that uses pandas inside a notebook, and have multiple cells which made the development easy, because I could check intermediate results within the notebook. This is very useful with pandas scripts. I downloaded that working code into a Python ".py" file, and converted that script to be a method within a Python class in my program, that is instantiated with the input data, and provides the output as a result of that method. Everything works great. That Python class is used in a much larger application, so that is the real deliverable.

But then there was a bug for a certain data set in the implementation in the method, which also was in my script. I could go back to my notebook and go step-by-step through the various cells to find the issue. I fix the issue, but then I have to carefully make the change back in the regular Python class method code. This is a bit painful.

Ideally, I'd like to be able to run a class method across cells, so I can check intermediate results. I can't figure out how to do this.

So what is the best practice between keeping a script code and code embedded within a class method in sync?

Yes, I know that I can import the class into the notebook, but then I lose the ability to look at intermediate results inside the class method via individual cells, which is what I do when it is a pure script. With pandas, this is very useful.

like image 667
Irv Avatar asked Aug 04 '16 14:08

Irv


People also ask

How do I debug my Python code in Jupyter Notebook?

Using the debugger is a helpful way to find and correct issues in notebook code. To debug your Python file: In VS Code, if you haven't already, activate a Python environment in which Jupyter is installed. From your Jupyter Notebook (.ipynb) select the convert button in the main toolbar.

What is the difference between Python Idle and Jupyter Notebook?

Pressing the enter key not only changes the line but produces the immediate result of the line after which it is pressed. Unlike Jupyter Notebook, IDLE doesn’t allow us to write the complete code first and then compute the results. But if a user wants to check each line of his code as he types it, he will prefer Python IDLE over Jupyter Notebook.

Should I use Jupyter Notebook or scripts?

If you don’t feel comfortable with coding everything in scripts, you could use both scripts and Jupyter Notebook for different purposes. For example, you could create classes and functions in scripts then import them in the notebook so that the notebook is less messy.

How do I run multiple cells in a Jupyter Notebook?

You can run multiple cells by using Run All, Run All Above, or Run All Below. You can save your Jupyter Notebook using the keyboard shortcut Ctrl+S or File > Save. You can export a Jupyter Notebook as a Python file ( .py ), a PDF, or an HTML file.


1 Answers

I have used your same development workflow and recognize the value of being able to step through code using the jupyter notebook. I've developed several packages by first hashing out the details and then eventually moving the polished product in to separate .py files. I do not think there is a simple solution to the inconvenience you encounter (I have run into the same issues), but I will describe my practice (I'm not so bold as to proclaim it the "best" practice) and maybe it will be helpful in your use case.

In my experience, once I have created a module/package from my jupyter notebook, it is easier to maintain/develop the code outside of the notebook and import that module into the notebook for testing.

Keeping each method small is good practice in general, and is very helpful for testing the logic at each step using the notebook. You can break larger "public" methods into smaller "private" methods named using a leading underscore (e.g. '_load_file'. You can call the "private" methods in your notebook for testing/debugging, but users of your module should know to ignore these methods.

You can use the reload function in the importlib module to quickly refresh your imported modules with changes made to the source.

import mymodule
from importlib import reload
reload(mymodule)

Calling import again will not actually update your namespace. You need the reload function (or similar) to force python to recompile/execute the module code.

Inevitably, you will still need to step through individual functions line by line, but if you've decomposed your code into small methods, the amount of code you need to "re-write" in the notebook is very small.

like image 148
Gordon Bean Avatar answered Nov 04 '22 02:11

Gordon Bean