Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Printed output not displayed when using joblib in jupyter notebook

So I am using joblib to parallelize some code and I noticed that I couldn't print things when using it inside a jupyter notebook.

I tried using doing the same example in ipython and it worked perfectly.

Here is a minimal (not) working example to write in a jupyter notebook cell

from joblib import Parallel, delayed
Parallel(n_jobs=8)(delayed(print)(i) for i in range(10))

So I am getting the output as [None, None, None, None, None, None, None, None, None, None] but nothing is printed.

What I expect to see (print order could be random in reality):

1
2
3
4
5
6
7
8
9
10
[None, None, None, None, None, None, None, None, None, None]

Note:

You can see the prints in the logs of the notebook process. But I would like the prints to happen in the notebook, not the logs of the notebook process.

EDIT

I have opened a Github issue, but with minimal attention so far.

like image 624
Zaccharie Ramzi Avatar asked May 02 '19 15:05

Zaccharie Ramzi


People also ask

Does joblib work in Jupyter notebook?

Yes joblib should work in interactive jupyter sessions (for interactively defined Python functions with picklable arguments).

What does joblib delayed do?

The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax. Under Windows, the use of multiprocessing. Pool requires to protect the main loop of code to avoid recursive spawning of subprocesses when using joblib.

What is the use of joblib in Python?

Joblib is a set of tools to provide lightweight pipelining in Python. In particular: transparent disk-caching of functions and lazy re-evaluation (memoize pattern) easy simple parallel computing.


2 Answers

I think this caused in part by the way Parallel spawns the child workers, and how Jupyter Notebook handles IO for those workers. When started without specifying a value for backend, Parallel will default to loky which utilizes a pooling strategy that directly uses a fork-exec model to create the subprocesses.

If you start Notebook from a terminal using

$ jupyter-notebook

the regular stderr and stdout streams appear to remain attached to that terminal, while the notebook session will start in a new browser window. Running the posted code snippet in the notebook does produce the expected output, but it seems to go to stdout and ends up in the terminal (as hinted in the Note in the question). This further supports the suspicion that this behavior is caused by the interaction between loky and notebook, and the way the standard IO streams are handled by notebook for child processes.

This lead me to this discussion on github (active within the past 2 weeks as of this posting) where the authors of notebook appear to be aware of this, but it would seem that there is no obvious and quick fix for the issue at the moment.

If you don't mind switching the backend that Parallel uses to spawn children, you can do so like this:

from joblib import Parallel, delayed
Parallel(n_jobs=8, backend='multiprocessing')(delayed(print)(i) for i in range(10))

with the multiprocessing backend, things work as expected. threading looks to work fine too. This may not be the solution you were hoping for, but hopefully it is sufficient while the notebook authors work on finding a proper solution.

I'll cross-post this to GitHub in case anyone there cares to add to this answer (I don't want to misstate anyone's intent or put words in people mouths!).


Test Environment:
MacOS - Mojave (10.14)
Python - 3.7.3
pip3 - 19.3.1

Tested in 2 configurations. Confirmed to produce the expected output when using both multiprocessing and threading for the backend parameter. Packages install using pip3.

Setup 1:

ipykernel                               5.1.1
ipython                                 7.5.0
jupyter                                 1.0.0
jupyter-client                          5.2.4
jupyter-console                         6.0.0
jupyter-core                            4.4.0
notebook                                5.7.8

Setup 2:

ipykernel                               5.1.4
ipython                                 7.12.0
jupyter                                 1.0.0
jupyter-client                          5.3.4
jupyter-console                         6.1.0
jupyter-core                            4.6.2
notebook                                6.0.3

I also was successful using the same versions as 'Setup 2' but with the notebook package version downgraded to 6.0.2.

Note:

This approach works inconsistently on Windows. Different combinations of software versions yield different results. Doing the most intuitive thing-- upgrading everything to the latest version-- does not guarantee it will work.

like image 68
Z4-tier Avatar answered Oct 19 '22 09:10

Z4-tier


In Z4-tier's git link scottgigante's method work in Windows, but opposite to the mentined results: in Jupyter notebook, the "multiprocessing" backend hang forever, but the default loky work well (python 3.8.5 and notebook 6.1.1):

from joblib import Parallel, delayed
import sys
def g(x):
    stream = getattr(sys, "stdout")
    print("{}".format(x), file=stream)
    stream.flush()
    return x
Parallel(n_jobs=2)(delayed(g)(x**2) for x in range(5))

executed in 91ms, finished 11:17:25 2021-05-13
[0, 1, 4, 9, 16]

A simpler method is to use the identity function in delay:

Parallel(n_jobs=2)(delayed(lambda y:y)([np.log(x),np.sin(x)]) for x in range(5))
executed in 151ms, finished 09:34:18 2021-05-17
[[-inf, 0.0],
 [0.0, 0.8414709848078965],
 [0.6931471805599453, 0.9092974268256817],
 [1.0986122886681098, 0.1411200080598672],
 [1.3862943611198906, -0.7568024953079282]]

Or use like this:

Parallel(n_jobs=2)(delayed(lambda y:[np.log(y),np.sin(y)])(x) for x in range(5))
executed in 589ms, finished 09:44:57 2021-05-17
[[-inf, 0.0],
 [0.0, 0.8414709848078965],
 [0.6931471805599453, 0.9092974268256817],
 [1.0986122886681098, 0.1411200080598672],
 [1.3862943611198906, -0.7568024953079282]]
like image 2
Frank Avatar answered Oct 19 '22 10:10

Frank