Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IPython notebook read string from raw text cell

I have a raw text cell in my IPython notebook project.

Is there a way to get the text as a string with a build in function or something similar?

like image 686
TM90 Avatar asked Oct 18 '14 13:10

TM90


2 Answers

My (possibly unsatisfactory) answer is in two parts. This is based on a personal investigation of iPython structures, and it's entirely possible I've missed something that more directly answers the question.

Current session

The raw text for code cells entered during the current session is available within a notebook using the list In.

So the raw text of the current cell can be returned by the following expression within the cell:

In[len(In)-1]

For example, evaluating a cell containing this code:

print "hello world"
three = 1+2
In[len(In)-1]

yields this corresponding Out[] value:

u'print "hello world"\nthree = 1+2\nIn[len(In)-1]'

So, within an active notebook session, you can access the raw text of cell as In[n], where n is the displayed index of the required cell.

But if the cell was entered during a previous Notebook session, which has subsequently been closed and reopened, that no longer works. Also, only code cells seem to be included in the In array.

Also, this doesn't work for non-code cells, so wouldn't work for a raw text cell.

Cells from saved notebook sessions

In my research, the only way I could uncover to get the raw text from previous sessions was to read the original notebook file. There is a documentation page Importing IPython Notebooks as Modules describing how to do this. The key code is in In[4]:

    # load the notebook object
    with io.open(path, 'r', encoding='utf-8') as f:
        nb = current.read(f, 'json')

where current is an instance of the API described at Module: nbformat.current.

The notebook object returned is accessed as a nested dictionary and list structure, e.g.:

    for cell in nb.worksheets[0].cells:
        ...

The cell objects thus enumerated have two key fields for the purpose of this question:

  1. cell.cell_type is the type of the cell ("code", "markdown", "raw", etc.).

  2. cell.input is the raw text content of the cell as a list of strings, with an entry for each line of text.

Much of this can be seen by looking at the JSON data that constitutes a saved iPython notebook.

Apart from the "prompt number" fields in a notebook, which seem to change whenever the field is re-evaluated, I could find no way to create a stable reference to a notebook cell.

Conclusion

I couldn't find an easy answer to the original question. What I found is covered above. Without knowing the motivation behind the original question, I can't know if it's enough.

What I looked for, but was unable to identify, was a way to reference the current notebook that can be used from within the notebook itself (e.g. via a function like get_ipython()). That doesn't mean it doesn't exist.

The other missing piece in my response is any kind of stable way to refer to a specific cell. (e.g. Looking at the notebook file format, raw text cells consist solely of a cell type ("raw") and the raw text itself, though it appears that cell metadata might also be included.) This suggests the only way to directly reference a cell is through its position in the notebook, but that is subject too change when the notebook is edited.

(Researched and answered as part of the Oxford participation in http://aaronswartzhackathon.org)

like image 146
Graham Klyne Avatar answered Sep 19 '22 19:09

Graham Klyne


I am not allowed to comment due to my lack of reputation so I will just post as answer an update to Graham Klyne's answer, in case someone else stumble into this. (Ipython has no updated documentation yet to date)

  1. Use nbformat instead of Ipython.nbformat.current
  2. The worksheets attribute is gone so use cells directly.

I have an example of how the updated code will look like: https://github.com/ldiary/marigoso/blob/master/marigoso/NotebookImport.py

like image 31
ldiary Avatar answered Sep 19 '22 19:09

ldiary