Say I setup memoization with Joblib as follows (using the solution provided here):
from tempfile import mkdtemp
cachedir = mkdtemp()
from joblib import Memory
memory = Memory(cachedir=cachedir, verbose=0)
@memory.cache
def run_my_query(my_query)
...
return df
And say I define a couple of queries, query_1
and query_2
, both of them take a long time to run.
I understand that, with the code as it is:
The second call with either query, would use the memoized output, i.e:
run_my_query(query_1)
run_my_query(query_1) # <- Uses cached output
run_my_query(query_2)
run_my_query(query_2) # <- Uses cached output
I could use memory.clear()
to delete the entire cache directory
But what if I want to re-do the memoization for only one of the queries (e.g. query_2
) without forcing a delete on the other query?
It seems like the library does not support partial erase of the cache.
You can separate the cache, functino into two pairs:
from tempfile import mkdtemp
from joblib import Memory
memory1 = Memory(cachedir=mkdtemp(), verbose=0)
memory2 = Memory(cachedir=mkdtemp(), verbose=0)
@memory1.cache
def run_my_query1()
# run query_1
return df
@memory2.cache
def run_my_query2()
# run query_2
return df
Now, you can selectively clear the cache:
memory2.clear()
UPDATE after seeing behzad.nouri's comment:
You can use call
method of decorated function. But as you can see in the following example, the return value is different from the normal call. You should take care of it.
>>> import tempfile
>>> import joblib
>>> memory = joblib.Memory(cachedir=tempfile.mkdtemp(), verbose=0)
>>> @memory.cache
... def run(x):
... print('called with {}'.format(x)) # for debug
... return x
...
>>> run(1)
called with 1
1
>>> run(2)
called with 2
2
>>> run(3)
called with 3
3
>>> run(2) # Cached
2
>>> run.call(2) # Force call of the original function
called with 2
(2, {'duration': 0.0011069774627685547, 'input_args': {'x': '2'}})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With