I am only using the basic joblib functionality:
Parallel(n_jobs=-1)(delayed(function)(arg) for arg in arglist)
I am frequently getting the warning:
UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
This tells me that one possible cause is a too short worker timeout. Since I did not set a worker timeout and default is None
, this cannot be the issue. How do I go about finding a memory leak? Or is there something I can do to avoid this warning? Did some parts not get executed? Or should I just not worry about this?
The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax. Warning. Under Windows, the use of multiprocessing. Pool requires to protect the main loop of code to avoid recursive spawning of subprocesses when using joblib.
Joblib is a set of tools to provide lightweight pipelining in Python. In particular: transparent disk-caching of functions and lazy re-evaluation (memoize pattern) easy simple parallel computing.
Joblib provides a better way to avoid recomputing the same function repetitively saving a lot of time and computational cost. For example, let's take a simple example below: As seen above, the function is simply computing the square of a number over a range provided. It takes ~20 s to get the result.
To fix, increase timeout, I used this:
# Increase timeout (tune this number to suit your use case).
timeout=99999
result_chunks = joblib.Parallel(n_jobs=njobs, timeout=timeout)(joblib.delayed(f_chunk)(i) for i in n_chunks)
Note that this warning is benign; joblib will recover and results are complete and accurate.
See a more detailed answer here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With