I'm using Joblib to cache results of a computationally expensive function in my python script. The function's input arguments and return values are numpy arrays. The cache works fine for a single run of my python script. Now I want to spawn multiple runs of my python script in parallel for sweeping some parameter in an experiment. (The definition of the function remains same across all the runs).
Is there a way to share the joblib cache among multiple python scripts running in parallel? This would save a lot of function evaluations which are repeated across different runs but do not repeat within a single run. I couldn't find if this is possible in Joblib's documentation
By default joblib. Parallel uses the 'loky' backend module to start separate Python worker processes to execute tasks concurrently on separate CPUs.
TL;DR - it preserves order for both backends.
Joblib provides a better way to avoid recomputing the same function repetitively saving a lot of time and computational cost. For example, let's take a simple example below: As seen above, the function is simply computing the square of a number over a range provided. It takes ~20 s to get the result.
Joblib is a set of tools to provide lightweight pipelining in Python. In particular: transparent disk-caching of functions and lazy re-evaluation (memoize pattern) easy simple parallel computing.
Specify a common, fixed cachedir
and decorate the function that you want to cache using
from joblib import Memory
mem = Memory(cachedir=cachedir)
@mem.cache
def f(arguments):
"""do things"""
pass
or simply
def g(arguments):
pass
cached_g = mem.cache(g)
Then, even if you are working across processes, across machines, if all instances of your program have access to cachedir
, then common function calls can be cached there transparently.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With