Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a ipython notebook

My Jupyter notebooks is getting long, which makes it difficult to navigate.

I want to save each chapter (Cel starting with Heading 1) to a different file. How can I do that? Cut and paste of multiple cells between notebooks seems not possible.

like image 304
sjdh Avatar asked Sep 01 '14 03:09

sjdh


People also ask

How do you split a Jupyter Notebook?

Enter command mode ( Esc ), use Shift + s to toggle the current cell to either a split cell or full width.

How do you split a cell in Jupyter Notebook Mac?

Enter command mode (esc), use shift-s to toggle the current cell to either a split cell or full width.


2 Answers

This is the method I use - it is a little awkward, but it works:

  1. Make multiple copies of the master notebook using File->Make Copy from the menu. make one copy for each chapter you want to extract.
  2. Rename the copy for each chapter: e.g. rename "master-copy0" to "Chapter 1".
  3. Delete each cells that don't belong to Chapter 1 - for example using 'dd' in command mode.
  4. Save the abbreviated file.
  5. Repeat steps 3 and 4 for each chapter.

I believe that the developers may be working on a better solution for a future release.

like image 133
David Smith Avatar answered Sep 21 '22 12:09

David Smith


A notebook file is json format, so I get all data as JSON format and split it into several files automatically.

This code is what I made.

The code seems to be complex, but it is simple if you just check it for a while and this is an example of a separate file, http://www.fun-coding.org/DS&AL4-1.html which I also transformed as HTML after I split it.

import json
from pprint import pprint
import re

def notebook_spliter(FILENAME, chapter_num):

    with open(FILENAME + '.ipynb') as data_file:    
        data = json.load(data_file)

    copy_cell, chapter_in = list(), False

    regx = re.compile("## [0-9]+\. ")
    for num in range(len(data['cells'])):
        if chapter_in and data['cells'][num]['cell_type'] != 'markdown':
            copy_cell.append(data['cells'][num])
        elif data['cells'][num]['cell_type'] == 'markdown':
            regx_result = regx.match(data['cells'][num]['source'][0])

            if regx_result:
                print (regx_result.group())
                regx2 = re.compile("[0-9]+")
                regx2_result = regx2.search(regx_result.group())
                if regx2_result:
                    print (int(regx2_result.group()))
                    if chapter_in == False:
                        if chapter_num == int(regx2_result.group()):
                            chapter_in = True
                            copy_cell.append(data['cells'][num])
                    else:
                        if chapter_num != int(regx2_result.group()):
                            break
            elif chapter_in:
                copy_cell.append(data['cells'][num])

    copy_data["cells"] = copy_cell
    copy_data["metadata"] = data["metadata"]
    copy_data["nbformat"] = data["nbformat"]
    copy_data["nbformat_minor"] = data["nbformat_minor"]
    with open(FILENAME + '-' + str(chapter_num) + '.ipynb', 'w') as fd:
        json.dump(copy_data, fd, ensure_ascii=False)

This is a function to check chapter numbers in a notebook file. I added chapter number to the notebook file with '## 1. chapter name' in markdown cell, so just check ## digit. pattern with regular expression.

Then, next code is to copy data of cells into this chapter number, and save the only copied cells and others(metadata, nbformat, and nbformat_minor) to separate file.

copy_data = dict()
FILENAME = 'DS&AL1' 
CHAPTERS = list()
with open(FILENAME + '.ipynb') as data_file:    
    data = json.load(data_file)

for num in range(len(data['cells'])):
    if data['cells'][num]['cell_type'] == 'markdown':
        regx_result = regx.match(data['cells'][num]['source'][0])

        if regx_result:
            regx2 = re.compile("[0-9]+")
            regx2_result = regx2.search(regx_result.group())
            if regx2_result:
                CHAPTERS.append(int(regx2_result.group()))
print (CHAPTERS)

for chapternum in CHAPTERS:
    notebook_spliter(FILENAME, chapternum)
like image 23
Dave Lee Avatar answered Sep 20 '22 12:09

Dave Lee