I have a Jupyter notebook that I plan to run repeatedly. It has functions in it, the structure of the code is this: <pre class="prettyprint"><code>def construct_url(data): ... return url def scrape_url(url): ... # fetch url, extract data return parsed_data for i in mylist: url = construct_url(i) data = scrape_url(url) ... # use the data to do analysis </code></pre> I'd like to write tests for <code>construct_url</code> and <code>scrape_url</code>. What's the most sensible way to do this? Some approaches I've considered: <ul> <li>Move the functions out into a utility file, and write tests for that utility file in some standard Python testing library. Possibly the best option, though it means that not all of the code is visible in the notebook.</li> <li>Write asserts within the notebook itself, using test data (adds noise to the notebook).</li> <li>Use specialised Jupyter testing to test the content of the cells (don't think this works, because the content of the cells is going to change). </li> </ul>

Python standard testing tools, such as doctest and unittest, can be used directly in a notebook. <h3>Doctest</h3> A notebook cell with a function and a test case in a docstring: <pre class="prettyprint lang-py prettyprint-override"><code>def add(a, b): ''' This is a test: >>> add(2, 2) 5 ''' return a + b </code></pre> A notebook cell (the last one in the notebook) that runs all test cases in the docstrings: <pre class="prettyprint"><code>import doctest doctest.testmod(verbose=True) </code></pre> Output: <pre class="prettyprint lang-none prettyprint-override"><code>Trying: add(2, 2) Expecting: 5 ********************************************************************** File "__main__", line 4, in __main__.add Failed example: add(2, 2) Expected: 5 Got: 4 1 items had no tests: __main__ ********************************************************************** 1 items had failures: 1 of 1 in __main__.add 1 tests in 2 items. 0 passed and 1 failed. ***Test Failed*** 1 failures. </code></pre> <h3>Unittest</h3> A notebook cell with a function: <pre class="prettyprint lang-py prettyprint-override"><code>def add(a, b): return a + b </code></pre> A notebook cell (the last one in the notebook) that contains a test case. The last line in the cell runs the test case when the cell is executed: <pre class="prettyprint lang-py prettyprint-override"><code>import unittest class TestNotebook(unittest.TestCase): def test_add(self): self.assertEqual(add(2, 2), 5) unittest.main(argv=[''], verbosity=2, exit=False) </code></pre> Output: <pre class="prettyprint lang-none prettyprint-override"><code>test_add (__main__.TestNotebook) ... FAIL ====================================================================== FAIL: test_add (__main__.TestNotebook) ---------------------------------------------------------------------- Traceback (most recent call last): File "<ipython-input-15-4409ad9ffaea>", line 6, in test_add self.assertEqual(add(2, 2), 5) AssertionError: 4 != 5 ---------------------------------------------------------------------- Ran 1 test in 0.001s FAILED (failures=1) </code></pre> <h3>Debugging a Failed Test</h3> While debugging a failed test, it is often useful to halt the test case execution at some point and run a debugger. For this, insert the following code just before the line at which you want the execution to halt: <pre class="prettyprint lang-py prettyprint-override"><code>import pdb; pdb.set_trace() </code></pre> For example: <pre class="prettyprint lang-py prettyprint-override"><code>def add(a, b): ''' This is the test: >>> add(2, 2) 5 ''' import pdb; pdb.set_trace() return a + b </code></pre> For this example, the next time you run the doctest, the execution will halt just before the return statement and the Python debugger (pdb) will start. You will get a pdb prompt directly in the notebook, which will allow you to inspect the values of <code>a</code> and <code>b</code>, step over lines, etc. Note: Starting with Python 3.7, the built-in <code>breakpoint()</code> can be used instead of <code>import pdb; pdb.set_trace()</code>. I created a Jupyter notebook for experimenting with the techniques I have just described. You can try it out with <img src="https://mybinder.org/badge_logo.svg" alt="Binder">

Unit tests for functions in a Jupyter notebook?

Tags:

python

unit-testing

testing

jupyter

reproducible-research

I have a Jupyter notebook that I plan to run repeatedly. It has functions in it, the structure of the code is this:

def construct_url(data):     ...     return url  def scrape_url(url):     ... # fetch url, extract data     return parsed_data  for i in mylist:      url = construct_url(i)     data = scrape_url(url)     ... # use the data to do analysis

I'd like to write tests for construct_url and scrape_url. What's the most sensible way to do this?

Some approaches I've considered:

Move the functions out into a utility file, and write tests for that utility file in some standard Python testing library. Possibly the best option, though it means that not all of the code is visible in the notebook.
Write asserts within the notebook itself, using test data (adds noise to the notebook).
Use specialised Jupyter testing to test the content of the cells (don't think this works, because the content of the cells is going to change).

770

asked Oct 21 '16 08:10

Richard

1 Answers

Python standard testing tools, such as doctest and unittest, can be used directly in a notebook.

Doctest

A notebook cell with a function and a test case in a docstring:

def add(a, b):     '''     This is a test:     >>> add(2, 2)     5     '''     return a + b

A notebook cell (the last one in the notebook) that runs all test cases in the docstrings:

import doctest doctest.testmod(verbose=True)

Output:

Trying:     add(2, 2) Expecting:     5 ********************************************************************** File "__main__", line 4, in __main__.add Failed example:     add(2, 2) Expected:     5 Got:     4 1 items had no tests:     __main__ ********************************************************************** 1 items had failures:    1 of   1 in __main__.add 1 tests in 2 items. 0 passed and 1 failed. ***Test Failed*** 1 failures.

Unittest

A notebook cell with a function:

def add(a, b):     return a + b

A notebook cell (the last one in the notebook) that contains a test case. The last line in the cell runs the test case when the cell is executed:

import unittest  class TestNotebook(unittest.TestCase):          def test_add(self):         self.assertEqual(add(2, 2), 5)           unittest.main(argv=[''], verbosity=2, exit=False)

Output:

test_add (__main__.TestNotebook) ... FAIL  ====================================================================== FAIL: test_add (__main__.TestNotebook) ---------------------------------------------------------------------- Traceback (most recent call last):   File "<ipython-input-15-4409ad9ffaea>", line 6, in test_add     self.assertEqual(add(2, 2), 5) AssertionError: 4 != 5  ---------------------------------------------------------------------- Ran 1 test in 0.001s  FAILED (failures=1)

Debugging a Failed Test

While debugging a failed test, it is often useful to halt the test case execution at some point and run a debugger. For this, insert the following code just before the line at which you want the execution to halt:

import pdb; pdb.set_trace()

For example:

def add(a, b):     '''     This is the test:     >>> add(2, 2)     5     '''     import pdb; pdb.set_trace()     return a + b

For this example, the next time you run the doctest, the execution will halt just before the return statement and the Python debugger (pdb) will start. You will get a pdb prompt directly in the notebook, which will allow you to inspect the values of a and b, step over lines, etc.

Note: Starting with Python 3.7, the built-in breakpoint() can be used instead of import pdb; pdb.set_trace().

I created a Jupyter notebook for experimenting with the techniques I have just described. You can try it out with Binder

189

answered Oct 08 '22 06:10

SergiyKolesnikov

Related questions
                            
                                How to extract a filename from a URL & append a word to it?
                            
                                Cannot use Requests-Module on AWS Lambda
                            
                                How do I split a custom dataset into training and test datasets?
                            
                                Text processing - Python vs Perl performance [closed]
                            
                                Type annotations for Enum attribute
                            
                                Finding dead code in large python project [closed]
                            
                                Why is numpy's einsum faster than numpy's built in functions?
                            
                                How do I access my webcam in Python?
                            
                                Where is the __builtin__ module in Python3? Why was it renamed?
                            
                                Python multiprocessing.Pool: AttributeError
                            
                                SQLAlchemy, get object not bound to a Session
                            
                                How to write CSV output to stdout?
                            
                                Iterating over arbitrary dimension of numpy.array
                            
                                zeromq: how to prevent infinite wait?
                            
                                DRY way to add created/modified by and time
                            
                                pip no longer working after update error 'module' object is not callable
                            
                                Is it possible to insert a row at an arbitrary position in a dataframe using pandas?
                            
                                Finding common rows (intersection) in two Pandas dataframes
                            
                                Easiest way to rm -rf in Python
                            
                                print python stack trace without exception being raised

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With