Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple processes sharing a single Joblib cache

I'm using Joblib to cache results of a computationally expensive function in my python script. The function's input arguments and return values are numpy arrays. The cache works fine for a single run of my python script. Now I want to spawn multiple runs of my python script in parallel for sweeping some parameter in an experiment. (The definition of the function remains same across all the runs).

Is there a way to share the joblib cache among multiple python scripts running in parallel? This would save a lot of function evaluations which are repeated across different runs but do not repeat within a single run. I couldn't find if this is possible in Joblib's documentation

like image 503
Neha Karanjkar Avatar asked Jul 30 '14 09:07

Neha Karanjkar


People also ask

How does Joblib parallel work?

By default joblib. Parallel uses the 'loky' backend module to start separate Python worker processes to execute tasks concurrently on separate CPUs.

Does Joblib parallel preserve order?

TL;DR - it preserves order for both backends.

Why is Joblib used?

Joblib provides a better way to avoid recomputing the same function repetitively saving a lot of time and computational cost. For example, let's take a simple example below: As seen above, the function is simply computing the square of a number over a range provided. It takes ~20 s to get the result.

What is Joblib Python?

Joblib is a set of tools to provide lightweight pipelining in Python. In particular: transparent disk-caching of functions and lazy re-evaluation (memoize pattern) easy simple parallel computing.


1 Answers

Specify a common, fixed cachedir and decorate the function that you want to cache using

from joblib import Memory
mem = Memory(cachedir=cachedir)

@mem.cache
def f(arguments):
    """do things"""
    pass

or simply

def g(arguments):
   pass

cached_g = mem.cache(g)

Then, even if you are working across processes, across machines, if all instances of your program have access to cachedir, then common function calls can be cached there transparently.

like image 99
eickenberg Avatar answered Oct 19 '22 04:10

eickenberg