I am new to the <code>multiprocessing</code> module in Python and work with Jupyter notebooks. I have tried the following code snippet from PMOTW: <pre class="prettyprint"><code>import multiprocessing def worker(): """worker function""" print('Worker') return if __name__ == '__main__': jobs = [] for i in range(5): p = multiprocessing.Process(target=worker) jobs.append(p) p.start() </code></pre> When I run this as is, there is no output. I have also tried creating a module called <code>worker.py</code> and then importing that to run the code: <pre class="prettyprint"><code>import multiprocessing from worker import worker if __name__ == '__main__': jobs = [] for i in range(5): p = multiprocessing.Process(target=worker) jobs.append(p) p.start() </code></pre> There is still no output in that case. In the console, I see the following error (repeated multiple times): <pre class="prettyprint"><code>Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main exitcode = _main(fd) File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 116, in _main self = pickle.load(from_parent) AttributeError: Can't get attribute 'worker' on <module '__main__' (built-in)> </code></pre> However, I get the expected output when the code is saved as a Python script and exectued. What can I do to run this code directly from the notebook without creating a separate script?

I'm relatively new to parallel computing so I may be wrong with some technicalities. My understanding is this: Jupyter notebooks don't work with <code>multiprocessing</code> because the module pickles (serialises) data to send to processes. <code>multiprocess</code> is a fork of <code>multiprocessing</code> that uses dill instead of pickle to serialise data which allows it to work from within Jupyter notebooks. The API is identical so the only thing you need to do is to change <pre class="prettyprint"><code>import multiprocessing </code></pre> to... <pre class="prettyprint"><code>import multiprocess </code></pre> You can install <code>multiprocess</code> very easily with a simple <pre class="prettyprint"><code>pip install multiprocess </code></pre> You will however find that your processes will still not print to the output, (although in Jupyter labs they will print out to the terminal the server out is running in). I stumbled upon this post trying to work around this and will edit this post when I find out how to.

Python Multiprocessing within Jupyter Notebook

Tags:

python

multiprocessing

jupyter

I am new to the multiprocessing module in Python and work with Jupyter notebooks. I have tried the following code snippet from PMOTW:

import multiprocessing

def worker():
    """worker function"""
    print('Worker')
    return

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

When I run this as is, there is no output.

I have also tried creating a module called worker.py and then importing that to run the code:

import multiprocessing
from worker import worker

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

There is still no output in that case. In the console, I see the following error (repeated multiple times):

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
    exitcode = _main(fd)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 116, in _main
    self = pickle.load(from_parent)
AttributeError: Can't get attribute 'worker' on <module '__main__' (built-in)>

However, I get the expected output when the code is saved as a Python script and exectued.

What can I do to run this code directly from the notebook without creating a separate script?

521

asked Feb 17 '18 21:02

curiouscientist

1 Answers

I'm relatively new to parallel computing so I may be wrong with some technicalities. My understanding is this:

Jupyter notebooks don't work with multiprocessing because the module pickles (serialises) data to send to processes. multiprocess is a fork of multiprocessing that uses dill instead of pickle to serialise data which allows it to work from within Jupyter notebooks. The API is identical so the only thing you need to do is to change

import multiprocessing

to...

import multiprocess

You can install multiprocess very easily with a simple

pip install multiprocess

You will however find that your processes will still not print to the output, (although in Jupyter labs they will print out to the terminal the server out is running in). I stumbled upon this post trying to work around this and will edit this post when I find out how to.

129

answered Sep 21 '22 15:09

Eden Trainor

Related questions
                            
                                How to get confidence intervals from curve_fit
                            
                                Validating a Django model field based on another field's value?
                            
                                'ImportError: No module named dumbnet' when trying to run a script that leverages scapy on OS X
                            
                                python - prefix sum algorithm
                            
                                Django Rest Ordering custom
                            
                                get aspect ratio of axes
                            
                                Is it possible to send data from a Fortran program to Python using MPI?
                            
                                Add column to a sparse matrix
                            
                                Seaborn pairplot off-diagonal KDE with two classes
                            
                                How do you get the current figure number in Python's matplotlib?
                            
                                Seaborn Heatmap Subplots - keep axis ratio consistent
                            
                                Pandas DataFrame: set_index with inplace=True returns a NoneType, why?
                            
                                python pandas: diff between 2 dates in a groupby
                            
                                How to parse URL encoded data recieved via POST
                            
                                Unittest - Assert a set of items of a list are (or not) contained in another list
                            
                                Adding noise to numpy array
                            
                                Mentioning users via slack in webhooks
                            
                                Using pytest where test in subfolder
                            
                                Supervisor: ERROR (spawn error) when trying to launch gunicorn
                            
                                Python Convert string to dict

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With