
I am using multiprocessing module basically, I am still learning the capabilities of multiprocessing. I am using the book by Dusty Phillips and this code belongs to it.
import multiprocessing   import random from multiprocessing.pool import Pool  def prime_factor(value):     factors = []     for divisor in range(2, value-1):         quotient, remainder = divmod(value, divisor)         if not remainder:             factors.extend(prime_factor(divisor))             factors.extend(prime_factor(quotient))             break         else:             factors = [value]     return factors  if __name__ == '__main__':     pool = Pool()     to_factor = [ random.randint(100000, 50000000) for i in range(20)]     results = pool.map(prime_factor, to_factor)     for value, factors in zip(to_factor, results):         print("The factors of {} are {}".format(value, factors)) On the Windows PowerShell (not on jupyter notebook) I see the following
Process SpawnPoolWorker-5: Process SpawnPoolWorker-1: AttributeError: Can't get attribute 'prime_factor' on <module '__main__' (built-in)> I do not know why the cell never ends running?
Call kill() on Process The method is called on the multiprocessing. Process instance for the process that you wish to terminate.
Try in another browser (e.g. if you normally use Firefox, try with Chrome). This helps pin down where the problem is. Try disabling any browser extensions and/or any Jupyter extensions you have installed. Some internet security software can interfere with Jupyter.
Python provides a mutual exclusion lock for use with processes via the multiprocessing. Lock class. An instance of the lock can be created and then acquired by processes before accessing a critical section, and released after the critical section. Only one process can have the lock at any time.
In multiprocessing , multiple Python processes are created and used to execute a function instead of multiple threads, bypassing the Global Interpreter Lock (GIL) that can significantly slow down threaded Python programs.
It seems that the problem in Jupyter notebook as in different ide is the design feature. Therefore, we have to write the function (prime_factor) into a different file and import the module. Furthermore, we have to take care of the adjustments. For example, in my case, I have coded the function into a file known as defs.py
def prime_factor(value):     factors = []     for divisor in range(2, value-1):         quotient, remainder = divmod(value, divisor)         if not remainder:             factors.extend(prime_factor(divisor))             factors.extend(prime_factor(quotient))             break         else:             factors = [value]     return factors Then in the jupyter notebook I wrote the following lines
import multiprocessing   import random from multiprocessing import Pool import defs    if __name__ == '__main__':     pool = Pool()     to_factor = [ random.randint(100000, 50000000) for i in range(20)]     results = pool.map(defs.prime_factor, to_factor)     for value, factors in zip(to_factor, results):         print("The factors of {} are {}".format(value, factors)) This solved my problem

To execute a function without having to write it into a separated file manually:
We can dynamically write the task to process into a temporary file, import it and execute the function.
from multiprocessing import Pool from functools import partial import inspect  def parallal_task(func, iterable, *params):      with open(f'./tmp_func.py', 'w') as file:         file.write(inspect.getsource(func).replace(func.__name__, "task"))      from tmp_func import task      if __name__ == '__main__':         func = partial(task, params)         pool = Pool(processes=8)         res = pool.map(func, iterable)         pool.close()         return res     else:         raise "Not in Jupyter Notebook" We can then simply call it in a notebook cell like this:
def long_running_task(params, id):     # Heavy job here     return params, id  data_list = range(8)  for res in parallal_task(long_running_task, data_list, "a", 1, "b"):     print(res)  Ouput:
('a', 1, 'b') 0 ('a', 1, 'b') 1 ('a', 1, 'b') 2 ('a', 1, 'b') 3 ('a', 1, 'b') 4 ('a', 1, 'b') 5 ('a', 1, 'b') 6 ('a', 1, 'b') 7 Note: If you're using Anaconda and if you want to see the progress of the heavy task, you can use print() inside long_running_task(). The content of the print will be displayed in the Anaconda Prompt console.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With