Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel processes overwriting progress bars (tqdm)

I am writing a a script in Python 3.7 that launches several parallel tasks using multiprocessing.Process (a task per core). To track down the progress for each process, I used the library tqdm which implements a progress bar. My code looks like the following:

with tqdm(total=iterator_size) as progress_bar:
     for row in tqdm(batch):
         process_batch(batch)
         progress_bar.update(1)

The progress bar is indeed updated accordingly, but since multiple processes run the code above, each one overwrites the bar on the console, as the screenshot below illustrates.

enter image description here

Upon finishing, the console correctly displays the completed progress bars:

enter image description here

My goal is to have the progress bars updating without overwriting each other. Is there a way I can achieve this?

A possible solution would be to only display the progress bar on the process that's gonna take the longest (I know before hand which one is), but the best case scenario would be to have one for each process updating according to the second image.

All the solutions online address multiprocess.Pool, but I don't plan to change my architecture, since I can get the most out of multiprocess.Process.

like image 278
GRoutar Avatar asked Jun 20 '19 09:06

GRoutar


2 Answers

For updating without overwriting you need to use the position parameter of tqdm, which you can find here. Here, position=0 for the outermost bar, position=1 for the next, and so on where 0 and 1 are the number of lines to be skipped before printing the progress bar i.e. 0 means progress bar after 0 lines and 1 means after 1 line. As position takes the number of lines to be skipped, it requires the index of the process which we can get using multiprocessing.current_process

(NOTE: Don't input the pid number as it will skip that many lines before printing)

from multiprocessing import current_process

""" Your code Here
Here, the current_process() is the process object
current_process().name gives the name of the process
current_process()._identity gives a tuple of the number of process
"""

current = current_process()
with tqdm(total=iterator_size) as progress_bar:
    for row in tqdm(batch, desc=str(current.name),
              position=current._identity[0] - 1)):
        process_batch(batch)
        progress_bar.update(1)
like image 104
Bibyutatsu Avatar answered Oct 05 '22 23:10

Bibyutatsu


You can update your progress bar inside you parallel loop by using update

def func(x, pbar):
    for el in batch:
        do_something_with_el(el)
        pbar.update(1)

with tqdm(total=len(your_list)) as pbar:
    batches = np.array_split(your_list, how_many_batches)
    Parallel(n_jobs=-1, prefer='threads')(delayed(func)(batch, pbar)
        for batch in batches)    

This won't spawn many progress bars

like image 37
Andrea Santoro Avatar answered Oct 06 '22 01:10

Andrea Santoro