Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Multiprocessing within Jupyter Notebook

I am new to the multiprocessing module in Python and work with Jupyter notebooks. I have tried the following code snippet from PMOTW:

import multiprocessing

def worker():
    """worker function"""
    print('Worker')
    return

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

When I run this as is, there is no output.

I have also tried creating a module called worker.py and then importing that to run the code:

import multiprocessing
from worker import worker

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

There is still no output in that case. In the console, I see the following error (repeated multiple times):

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
    exitcode = _main(fd)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 116, in _main
    self = pickle.load(from_parent)
AttributeError: Can't get attribute 'worker' on <module '__main__' (built-in)>

However, I get the expected output when the code is saved as a Python script and exectued.

What can I do to run this code directly from the notebook without creating a separate script?

like image 521
curiouscientist Avatar asked Feb 17 '18 21:02

curiouscientist


People also ask

Can you use multiple kernels within the same Jupyter notebook?

SoS Notebook is an extension to Jupyter Notebook that allows the use of multiple kernels in one notebook. More importantly, it allows the exchange of data among subkernels so that you can, for example, preprocess data using Bash, analyze the processed data in Python, and plot the results in R.

Can we do multiprocessing in Python?

This is what gives multiprocessing an upper hand over threading in Python. Multiple processes can be run in parallel because each process has its own interpreter that executes the instructions allocated to it.

Can Jupyter notebook run parallel?

Using ipyparallel To run code in parallel across cores on one node, you can start up with workers and run your parallel code all within your notebook, as described here. If you'd like to run workers in parallel across multiple nodes, this may be possible and feel free to contact us to discuss further.

How do you do multiprocessing in Python?

In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.


1 Answers

I'm relatively new to parallel computing so I may be wrong with some technicalities. My understanding is this:

Jupyter notebooks don't work with multiprocessing because the module pickles (serialises) data to send to processes. multiprocess is a fork of multiprocessing that uses dill instead of pickle to serialise data which allows it to work from within Jupyter notebooks. The API is identical so the only thing you need to do is to change

import multiprocessing

to...

import multiprocess

You can install multiprocess very easily with a simple

pip install multiprocess

You will however find that your processes will still not print to the output, (although in Jupyter labs they will print out to the terminal the server out is running in). I stumbled upon this post trying to work around this and will edit this post when I find out how to.

like image 129
Eden Trainor Avatar answered Sep 21 '22 15:09

Eden Trainor