I am writing a a script in Python 3.7 that launches several parallel tasks using multiprocessing.Process
(a task per core). To track down the progress for each process, I used the library tqdm
which implements a progress bar. My code looks like the following:
with tqdm(total=iterator_size) as progress_bar:
for row in tqdm(batch):
process_batch(batch)
progress_bar.update(1)
The progress bar is indeed updated accordingly, but since multiple processes run the code above, each one overwrites the bar on the console, as the screenshot below illustrates.
Upon finishing, the console correctly displays the completed progress bars:
My goal is to have the progress bars updating without overwriting each other. Is there a way I can achieve this?
A possible solution would be to only display the progress bar on the process that's gonna take the longest (I know before hand which one is), but the best case scenario would be to have one for each process updating according to the second image.
All the solutions online address multiprocess.Pool
, but I don't plan to change my architecture, since I can get the most out of multiprocess.Process
.
For updating without overwriting you need to use the position
parameter of tqdm, which you can find here. Here, position=0
for the outermost bar, position=1
for the next, and so on where 0 and 1 are the number of lines to be skipped before printing the progress bar i.e. 0 means progress bar after 0 lines and 1 means after 1 line. As position
takes the number of lines to be skipped, it requires the index of the process which we can get using multiprocessing.current_process
(NOTE: Don't input the pid
number as it will skip that many lines before printing)
from multiprocessing import current_process
""" Your code Here
Here, the current_process() is the process object
current_process().name gives the name of the process
current_process()._identity gives a tuple of the number of process
"""
current = current_process()
with tqdm(total=iterator_size) as progress_bar:
for row in tqdm(batch, desc=str(current.name),
position=current._identity[0] - 1)):
process_batch(batch)
progress_bar.update(1)
You can update your progress bar inside you parallel loop by using update
def func(x, pbar):
for el in batch:
do_something_with_el(el)
pbar.update(1)
with tqdm(total=len(your_list)) as pbar:
batches = np.array_split(your_list, how_many_batches)
Parallel(n_jobs=-1, prefer='threads')(delayed(func)(batch, pbar)
for batch in batches)
This won't spawn many progress bars
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With