Selective Re-Memoization of DataFrames

Question

Say I setup memoization with Joblib as follows (using the solution provided here):

from tempfile import mkdtemp
cachedir = mkdtemp()

from joblib import Memory
memory = Memory(cachedir=cachedir, verbose=0)

@memory.cache
def run_my_query(my_query)
    ...
    return df

And say I define a couple of queries, query_1 and query_2, both of them take a long time to run.

I understand that, with the code as it is:

The second call with either query, would use the memoized output, i.e:

run_my_query(query_1)
run_my_query(query_1) # <- Uses cached output

run_my_query(query_2)
run_my_query(query_2) # <- Uses cached output

I could use memory.clear() to delete the entire cache directory

But what if I want to re-do the memoization for only one of the queries (e.g. query_2) without forcing a delete on the other query?

falsetru · Accepted Answer

It seems like the library does not support partial erase of the cache.

You can separate the cache, functino into two pairs:

from tempfile import mkdtemp
from joblib import Memory

memory1 = Memory(cachedir=mkdtemp(), verbose=0)
memory2 = Memory(cachedir=mkdtemp(), verbose=0)

@memory1.cache
def run_my_query1()
    # run query_1
    return df

@memory2.cache
def run_my_query2()
    # run query_2
    return df

Now, you can selectively clear the cache:

memory2.clear()

UPDATE after seeing behzad.nouri's comment:

You can use call method of decorated function. But as you can see in the following example, the return value is different from the normal call. You should take care of it.

>>> import tempfile
>>> import joblib
>>> memory = joblib.Memory(cachedir=tempfile.mkdtemp(), verbose=0)
>>> @memory.cache
... def run(x):
...     print('called with {}'.format(x))  # for debug
...     return x
...
>>> run(1)
called with 1
1
>>> run(2)
called with 2
2
>>> run(3)
called with 3
3
>>> run(2)  # Cached
2
>>> run.call(2)  # Force call of the original function
called with 2
(2, {'duration': 0.0011069774627685547, 'input_args': {'x': '2'}})

Selective Re-Memoization of DataFrames

Tags:

python

pandas

joblib

Amelio Vazquez-Reina

1 Answers

falsetru

Recent Activity

Donate For Us

Selective Re-Memoization of DataFrames

Tags:

python

pandas

joblib

Amelio Vazquez-Reina

1 Answers

falsetru

Related questions

Recent Activity

Donate For Us