Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Per-cell output for threaded IPython Notebooks

I don't want to raise this as an issue, because it seems like a completely unreasonable feature request for what is a fairly amazing tool. But if any readers happen to be familiar with the architecture I'd be interested to know if a potential extension seems feasible.

I recently wrote a notebook with some simple threaded code in it, just to see what would happen when I ran it. The notebook code (tl;dr it starts a number of parallel threads that print in a sleep loop) is available at https://gist.github.com/4562840.

By hitting SHIFT-RETURN a few times as the code runs you can observe that any output from the kernel appears in the output area of the current cell, not that of the cell in which the code was run.

I was wondering if it would be possible, if threads were active for a cell, to display a "refresh" button allowing the output area to be updated asynchronously. Ideally the refresh button would disappear if it was clicked after all threads had ended (after a final update).

This would depend, though, on being able to identify and intercept the print output for each thread and direct it to a buffer for the specific cell's output. So, two questions.

  1. Am I correct in believing the the hard-wiring of Python 2's print statement means that this enhancement can not be implemented with a standard interpreter?

  2. Are the prospects for Python 3 any better, given that it's possible to sneak another layer into the print() stack inside the IPython kernel?and especially for those who didn't follow a Python link to get here,

  3. [nobody expects the Spanish Inquisition] More generally, can you point to (language-agnostic) examples of multiple streams being delivered into a page? Are there any established best practices for constructing and modifying the DOM to handle this?

like image 766
holdenweb Avatar asked Jan 18 '13 07:01

holdenweb


People also ask

How do you see the output of a cell in Jupyter Notebook?

Jupyter Notebook can print the output of each cell just below the cell. When you have a lot of output you can reduce the amount of space it takes up by clicking on the left side panel of the output. This will turn the output into a scrolling window.

How do I extract the output from a Jupyter Notebook?

The Jupyter Notebook has an option to export the notebook to many formats. It can be accessed by clicking File -> Download as -> PDF via LaTeX (or PDF via HTML - not visible in the screenshot).


1 Answers

UPDATE:

Am I correct in believing the the hard-wiring of Python 2's print statement means that this enhancement can not be implemented with a standard interpreter?

No, the important parts of the print statement are not hardwired at all. print simply writes to sys.stdout, which can be any object with write and flush methods. IPython already completely replaces this object in order to get stdout to the notebook in the first place (see below).

Are the prospects for Python 3 any better, given that it's possible to sneak another layer into the print() stack inside the IPython kernel?and especially for those who didn't follow a Python link to get here,

Nope - overriding sys.stdout is all you need, not print itself (see above, below, and elsewhere). There are no advantages to Python 3 here.

[nobody expects the Spanish Inquisition] More generally, can you point to (language-agnostic) examples of multiple streams being delivered into a page?

Sure - the IPython notebook itself. It uses message IDs and metadata to determine the origin of stdout messages, and in turn where those messages should end up. Below, in my original answer to a question that apparently nobody asked, I show an example of simultaneously drawing output coming from multiple cells whose threads are running concurrently.

In order to get the refresh behavior you desire, you would probably need to do two things:

  1. replace sys.stdout with your own object that uses the IPython display protocol to send messages with your own thread-identifying metadata (e.g. threading.current_thread().ident). This should be done in a context manager (as below), so it only affects the print statements you actually want it to.
  2. write an IPython js plugin for handling your new format of stdout messages, so that they are not drawn immediately, but rather stored in arrays, waiting to be drawn.

Original answer (wrong, but related question):

It relies on some shenanigans, and private APIs, but this is totally possible with current IPython (it may not be forever).

Here is an example notebook: http://nbviewer.ipython.org/4563193

In order to do this, you need to understand how IPython gets stdout to the notebook in the first place. This is done by replacing sys.stdout with an OutStream object. This buffers data, and then sends it over zeromq when sys.stdout.flush is called, and it ultimately ends up in the browser.

Now, how to send output to a particular cell.

The IPython message protocol uses a 'parent' header to identify which request produced which reply. Every time you ask IPython to run some code, it sets the parent header of various objects (sys.stdout included), so that their side effect messages are associated with the message that caused them. When you run code in a thread, that means that the current parent_header is just the most recent execute_request, rather than the original one that started any given thread.

With that in mind, here is a context manager that temporarily sets stdout's parent header to a particular value:

import sys
from contextlib import contextmanager


stdout_lock = threading.Lock()

@contextmanager
def set_stdout_parent(parent):
    """a context manager for setting a particular parent for sys.stdout

    the parent determines the destination cell of output
    """
    save_parent = sys.stdout.parent_header

    # we need a lock, so that other threads don't snatch control
    # while we have set a temporary parent
    with stdout_lock:
        sys.stdout.parent_header = parent
        try:
            yield
        finally:
            # the flush is important, because that's when the parent_header actually has its effect
            sys.stdout.flush()
            sys.stdout.parent_header = save_parent

And here is a Thread that records the parent when the thread starts, and applies that parent each time it makes a print statement, so it behaves as if it were still in the original cell:

import threading

class counterThread(threading.Thread):
    def run(self):
        # record the parent when the thread starts
        thread_parent = sys.stdout.parent_header
        for i in range(3):
            time.sleep(2)
            # then ensure that the parent is the same as when the thread started
            # every time we print
            with set_stdout_parent(thread_parent):
                print i

And finally, a notebook tying it all together, with timestamps showing actual concurrent printing to multiple cells:

http://nbviewer.ipython.org/4563193/

like image 80
minrk Avatar answered Sep 27 '22 22:09

minrk