Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting output from jupyter kernel in (i)python script

I'd like to open several kernels from within a single ipython session, run code on those kernels, and then collect the results. But I can't figure out how to collect the results, or even see stdout/stderr. How can I do these things?

What I've got so far

I've managed the first two steps (open kernels and run code on them) with code like the following:

from jupyter_client import MultiKernelManager
kernelmanager = MultiKernelManager()
remote_id = kernelmanager.start_kernel('python3')
remote_kernel = kernelmanager.get_kernel(remote_id)
remote = remote_kernel.client()
sent_msg_id = remote.execute('2+2')

[I welcome any suggestions for how to improve that, or for how to close these kernels and clients.]

Here, python3 can be the name of any of the kernels I have set up (which can be listed at the command line with jupyter-kernelspec list). And I seem to be able to run any reasonable code in place of '2+2'. For example, I can write to a file, and that file really gets created.

Now, the problem is how to get the result. I can get some message that's seemingly related as

reply = remote.get_shell_msg(sent_msg_id)

That reply is a dictionary like this:

{'buffers': [],
 'content': {'execution_count': 2,
  'payload': [],
  'status': 'ok',
  'user_expressions': {}},
 'header': {'date': datetime.datetime(2015, 10, 19, 14, 34, 34, 378577),
  'msg_id': '98e216b4-3251-4085-8eb1-bfceedbae3b0',
  'msg_type': 'execute_reply',
  'session': 'ca4d615d-82b7-487f-88ff-7076c2bdd109',
  'username': 'me',
  'version': '5.0'},
 'metadata': {'dependencies_met': True,
  'engine': '868de9dd-054b-4630-99b7-0face61915a6',
  'started': '2015-10-19T14:34:34.265718',
  'status': 'ok'},
 'msg_id': '98e216b4-3251-4085-8eb1-bfceedbae3b0',
 'msg_type': 'execute_reply',
 'parent_header': {'date': datetime.datetime(2015, 10, 19, 14, 34, 34, 264508),
  'msg_id': '2674c61a-c79a-48a6-b88a-1f2e8da68a80',
  'msg_type': 'execute_request',
  'session': '767ae562-38d6-41a3-a9dc-6faf37d83222',
  'username': 'me',
  'version': '5.0'}}

This is documented in Messaging in Jupyter. What isn't documented is how to actually use this -- i.e., which functions do I use, when and where do I find messages, etc. I've seen this question and its answer, which has useful related information, but doesn't quite get me to the answer. And this answer doesn't get any useful output, either.

So, for example, I've tried to also get the msg with the msg_id given in the result above, but it just hangs. I've tried everything I can think of, but can't figure out how to get anything back from the kernel. How do I do it? Can I transfer data back from the kernel in some sort of string? Can I see its stdout and stderr?

Background

I'm writing an ipython magic to run a code snippet on remote kernels. [Edit: This now exists and is available here.] The idea is that I'll have a notebook on my laptop, and gather data from several remote servers by just having a little magic cell like this:

%%remote_exec -kernels server1,server2
2+2
! hostname

I use remote_ikernel to connect to those remote kernels easily and automatically. That seems to work just fine; I've got my magic command with all its bells and whistles working great, opening up these remote kernels, and running the code. Now I want to get some of that data from the remote sent back to my laptop -- presumably by serializing it in some way. At the moment, I think pickle.dumps and pickle.loads would be perfect for this part; I just have to get those bytes created and used by these functions from one kernel to the other. I'd rather not use actual files for the pickling, though this would be potentially be acceptable.

Edit:

It looks like it's possible with some monstrosity like this:

remote.get_shell_msg(remote.execute('import pickle'))
sent_msg_id = remote.execute('a=2+2', user_expressions={'output':'pickle.dumps({"a":a})'})
reply = remote.get_shell_msg(sent_msg_id)
output_bytes = reply['content']['user_expressions']['output']['data']['text/plain']
variable_dict = pickle.loads(eval(output_bytes))

And now, variable_dict['a'] is just 4. Note, however, that output_bytes is a string representing those bytes, so it has to be evaled. This seems ridiculous (and still doesn't show how I'd get stdout). Is there a better way? And how do I get stdout?

Edit 2:

Though I'm unhappy with my hack above, I have successfully used it to write a little module called remote_exec hosted on github, as described above. The module gives me a little ipython magic that I can use to run code remotely on one or more other kernels. This is a more-or-less automatic process that I'm definitely satisfied with -- except for the nagging knowledge of what's happening underneath.

like image 212
Mike Avatar asked Oct 19 '15 19:10

Mike


Video Answer


1 Answers

You seem to be reinventing the wheel. You do not want to manage the kernels yourself. Use something like ipyparallel which is made to spawn many kernels and scatter/gather data (basically you are reinventing how it works). You will likely also be interested in dask and read one introduction from the author. IPyparallel and dask authors are working together to make the 2 project work well with each other. Don't manage the kernels, use ipyparallel instead.

like image 172
Matt Avatar answered Oct 24 '22 03:10

Matt