Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selective Re-Memoization of DataFrames

Say I setup memoization with Joblib as follows (using the solution provided here):

from tempfile import mkdtemp
cachedir = mkdtemp()

from joblib import Memory
memory = Memory(cachedir=cachedir, verbose=0)

@memory.cache
def run_my_query(my_query)
    ...
    return df

And say I define a couple of queries, query_1 and query_2, both of them take a long time to run.

I understand that, with the code as it is:

  • The second call with either query, would use the memoized output, i.e:

    run_my_query(query_1)
    run_my_query(query_1) # <- Uses cached output
    
    run_my_query(query_2)
    run_my_query(query_2) # <- Uses cached output   
    
  • I could use memory.clear() to delete the entire cache directory

But what if I want to re-do the memoization for only one of the queries (e.g. query_2) without forcing a delete on the other query?

like image 316
Amelio Vazquez-Reina Avatar asked Sep 23 '14 14:09

Amelio Vazquez-Reina


1 Answers

It seems like the library does not support partial erase of the cache.

You can separate the cache, functino into two pairs:

from tempfile import mkdtemp
from joblib import Memory

memory1 = Memory(cachedir=mkdtemp(), verbose=0)
memory2 = Memory(cachedir=mkdtemp(), verbose=0)

@memory1.cache
def run_my_query1()
    # run query_1
    return df

@memory2.cache
def run_my_query2()
    # run query_2
    return df

Now, you can selectively clear the cache:

memory2.clear()

UPDATE after seeing behzad.nouri's comment:

You can use call method of decorated function. But as you can see in the following example, the return value is different from the normal call. You should take care of it.

>>> import tempfile
>>> import joblib
>>> memory = joblib.Memory(cachedir=tempfile.mkdtemp(), verbose=0)
>>> @memory.cache
... def run(x):
...     print('called with {}'.format(x))  # for debug
...     return x
...
>>> run(1)
called with 1
1
>>> run(2)
called with 2
2
>>> run(3)
called with 3
3
>>> run(2)  # Cached
2
>>> run.call(2)  # Force call of the original function
called with 2
(2, {'duration': 0.0011069774627685547, 'input_args': {'x': '2'}})
like image 142
falsetru Avatar answered Sep 22 '22 18:09

falsetru