Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Simple Loop Parallelization Jupyter Notebook

I am trying to parallelize a simple python loop using Jupyter Notebook. I tried to use Pool but it just hangs forever and I have to kill the notebook to stop it.

def process_frame(f):
    new_dict = dict()
    pc_dict = calculate_area(fl)
    for key in pc_dict:
        if key not in new_dict:
            new_dict[key] = 0
        new_dict[key] = float(sum(pc_dict[key]))
    full_pc_dict[fl] = new_dict

frames_list = [0, 1, 2, 3, 4, 5, 6]

I want to process_frame for each frame in the frames_list.

Note that the final outcome should be a dict with all the outputs from process_frame. I don't know if appending it at the end of the function may be a good idea.

Any suggestion on how to do this using Jupyter Notebook? Also, is it possible to have tqdm working with this parallel processing?

Kind regards

like image 370
vftw Avatar asked Jan 26 '23 00:01

vftw


1 Answers

[UPDATED]
If you want to use multiprocessing inside jupyter notebooks you want to use multiprocess package instead of built-in multiprocessing (there's a known issue with main function of jupyter notebooks vs multiprocessing)

Create a separate .py file with your magic function. If you want to do it inside your notebook - use something like this in a separate code cell:

%%writefile magic_functions.py

def magic_function(f):
    return f+10

def process_frame(f):
    # changed your logic here as I couldn't repro it
    return f, magic_function(f)

OUT: Writing magic_functions.py

And then run your code in parallel:

from tqdm import tqdm

from multiprocess import Pool
from magic_functions import process_frame

frames_list = [0, 1, 2, 3, 4, 5, 6]

max_pool = 5

with Pool(max_pool) as p:
    pool_outputs = list(
        tqdm(
            p.imap(process_frame,
                   frames_list),
            total=len(frames_list)
        )
    )    

print(pool_outputs)
new_dict = dict(pool_outputs)

print("dict:", new_dict)

OUT:

100%|████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 37.63it/s]

[(0, 10), (1, 11), (2, 12), (3, 13), (4, 14), (5, 15), (6, 16)]
dict: {0: 10, 1: 11, 2: 12, 3: 13, 4: 14, 5: 15, 6: 16}


like image 167
Karol Żak Avatar answered Jan 28 '23 11:01

Karol Żak