Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jupyter notebook never finishes processing using multiprocessing (Python 3)

enter image description here

Jupyter Notebook

I am using multiprocessing module basically, I am still learning the capabilities of multiprocessing. I am using the book by Dusty Phillips and this code belongs to it.

import multiprocessing   import random from multiprocessing.pool import Pool  def prime_factor(value):     factors = []     for divisor in range(2, value-1):         quotient, remainder = divmod(value, divisor)         if not remainder:             factors.extend(prime_factor(divisor))             factors.extend(prime_factor(quotient))             break         else:             factors = [value]     return factors  if __name__ == '__main__':     pool = Pool()     to_factor = [ random.randint(100000, 50000000) for i in range(20)]     results = pool.map(prime_factor, to_factor)     for value, factors in zip(to_factor, results):         print("The factors of {} are {}".format(value, factors)) 

On the Windows PowerShell (not on jupyter notebook) I see the following

Process SpawnPoolWorker-5: Process SpawnPoolWorker-1: AttributeError: Can't get attribute 'prime_factor' on <module '__main__' (built-in)> 

I do not know why the cell never ends running?

like image 570
rsc05 Avatar asked Nov 15 '17 17:11

rsc05


People also ask

How do you exit multiprocessing in Python?

Call kill() on Process The method is called on the multiprocessing. Process instance for the process that you wish to terminate.

How do I fix Jupyter notebook not running?

Try in another browser (e.g. if you normally use Firefox, try with Chrome). This helps pin down where the problem is. Try disabling any browser extensions and/or any Jupyter extensions you have installed. Some internet security software can interfere with Jupyter.

How lock in multiprocessing Python?

Python provides a mutual exclusion lock for use with processes via the multiprocessing. Lock class. An instance of the lock can be created and then acquired by processes before accessing a critical section, and released after the critical section. Only one process can have the lock at any time.

Does multiprocessing make Python faster?

In multiprocessing , multiple Python processes are created and used to execute a function instead of multiple threads, bypassing the Global Interpreter Lock (GIL) that can significantly slow down threaded Python programs.


2 Answers

It seems that the problem in Jupyter notebook as in different ide is the design feature. Therefore, we have to write the function (prime_factor) into a different file and import the module. Furthermore, we have to take care of the adjustments. For example, in my case, I have coded the function into a file known as defs.py

def prime_factor(value):     factors = []     for divisor in range(2, value-1):         quotient, remainder = divmod(value, divisor)         if not remainder:             factors.extend(prime_factor(divisor))             factors.extend(prime_factor(quotient))             break         else:             factors = [value]     return factors 

Then in the jupyter notebook I wrote the following lines

import multiprocessing   import random from multiprocessing import Pool import defs    if __name__ == '__main__':     pool = Pool()     to_factor = [ random.randint(100000, 50000000) for i in range(20)]     results = pool.map(defs.prime_factor, to_factor)     for value, factors in zip(to_factor, results):         print("The factors of {} are {}".format(value, factors)) 

This solved my problem

enter image description here

like image 55
rsc05 Avatar answered Sep 20 '22 14:09

rsc05


To execute a function without having to write it into a separated file manually:

We can dynamically write the task to process into a temporary file, import it and execute the function.

from multiprocessing import Pool from functools import partial import inspect  def parallal_task(func, iterable, *params):      with open(f'./tmp_func.py', 'w') as file:         file.write(inspect.getsource(func).replace(func.__name__, "task"))      from tmp_func import task      if __name__ == '__main__':         func = partial(task, params)         pool = Pool(processes=8)         res = pool.map(func, iterable)         pool.close()         return res     else:         raise "Not in Jupyter Notebook" 

We can then simply call it in a notebook cell like this:

def long_running_task(params, id):     # Heavy job here     return params, id  data_list = range(8)  for res in parallal_task(long_running_task, data_list, "a", 1, "b"):     print(res)  

Ouput:

('a', 1, 'b') 0 ('a', 1, 'b') 1 ('a', 1, 'b') 2 ('a', 1, 'b') 3 ('a', 1, 'b') 4 ('a', 1, 'b') 5 ('a', 1, 'b') 6 ('a', 1, 'b') 7 

Note: If you're using Anaconda and if you want to see the progress of the heavy task, you can use print() inside long_running_task(). The content of the print will be displayed in the Anaconda Prompt console.

like image 32
H4dr1en Avatar answered Sep 18 '22 14:09

H4dr1en