Tracking progress of joblib.Parallel execution

Tags:

Is there a simple way to track the overall progress of a joblib.Parallel execution?

I have a long-running execution composed of thousands of jobs, which I want to track and record in a database. However, to do that, whenever Parallel finishes a task, I need it to execute a callback, reporting how many remaining jobs are left.

I've accomplished a similar task before with Python's stdlib multiprocessing.Pool, by launching a thread that records the number of pending jobs in Pool's job list.

Looking at the code, Parallel inherits Pool, so I thought I could pull off the same trick, but it doesn't seem to use these that list, and I haven't been able to figure out how else to "read" it's internal status any other way.

583

asked Jul 27 '14 17:07

Cerin

1 Answers

Yet another step ahead from dano's and Connor's answers is to wrap the whole thing as a context manager:

Click to copy

import contextlib import joblib from tqdm import tqdm  @contextlib.contextmanager def tqdm_joblib(tqdm_object):     """Context manager to patch joblib to report into tqdm progress bar given as argument"""     class TqdmBatchCompletionCallback(joblib.parallel.BatchCompletionCallBack):         def __call__(self, *args, **kwargs):             tqdm_object.update(n=self.batch_size)             return super().__call__(*args, **kwargs)      old_batch_callback = joblib.parallel.BatchCompletionCallBack     joblib.parallel.BatchCompletionCallBack = TqdmBatchCompletionCallback     try:         yield tqdm_object     finally:         joblib.parallel.BatchCompletionCallBack = old_batch_callback         tqdm_object.close()

Then you can use it like this and don't leave monkey patched code once you're done:

Click to copy

from joblib import Parallel, delayed  with tqdm_joblib(tqdm(desc="My calculation", total=10)) as progress_bar:     Parallel(n_jobs=16)(delayed(sqrt)(i**2) for i in range(10))

which is awesome I think and it looks similar to tqdm pandas integration.

answered Oct 01 '22 02:10

featuredpeow

Related questions
                            
                                Why does Python installed via Homebrew not include Tkinter
                            
                                Set specific DNS server using dns.resolver (pythondns)
                            
                                Range with step of type float [duplicate]
                            
                                range in jinja2 inside a for loop
                            
                                python + SQLAlchemy: deleting with the Session object
                            
                                Boolean Indexing with multiple conditions [duplicate]
                            
                                Filtering a Pyspark DataFrame with SQL-like IN clause
                            
                                Why does integer division round down in many scripting languages?
                            
                                Python k-means algorithm
                            
                                How to get pdf filename with Python requests?
                            
                                Extracting the first day of month of a datetime type column in pandas
                            
                                How does perspective transformation work in PIL?
                            
                                Test if an internet connection is present in python
                            
                                Calculate summary statistics of columns in dataframe
                            
                                Changing values of a list of namedtuples
                            
                                Divide a dictionary into variables [duplicate]
                            
                                Disable the output of matplotlib pyplot
                            
                                Use endswith with multiple extensions
                            
                                Parse raw HTTP Headers
                            
                                Python def function: How do you specify the end of the function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tracking progress of joblib.Parallel execution

Tags:

python

multithreading

parallel-processing

multiprocessing

joblib

Cerin

People also ask

1 Answers

featuredpeow

Recent Activity

Donate For Us