Say I have a function that runs a SQL query and returns a dataframe:
import pandas.io.sql as psql
import sqlalchemy
query_string = "select a from table;"
def run_my_query(my_query):
    # username, host, port and database are hard-coded here
    engine = sqlalchemy.create_engine('postgresql://{username}@{host}:{port}/{database}'.format(username=username, host=host, port=port, database=database))
    df = psql.read_sql(my_query, engine)
    return df
# Run the query (this is what I want to memoize)
df = run_my_query(my_query)
I would like to:
query_string (i.e. per query)How can I do this with joblib, jug?
Yes, you can do this with joblib (this example basically pastes itself):
>>> from tempfile import mkdtemp
>>> cachedir = mkdtemp()
>>> from joblib import Memory
>>> memory = Memory(cachedir=cachedir, verbose=0)
>>> @memory.cache
... def run_my_query(my_query)
...     ...
...     return df
You can clear the cache using memory.clear().
Note you could also use lru_cache or even "manually" with a simple dict:
def run_my_query(my_query, cache={})
    if my_query in cache:
        return cache[my_query]
    ...
    cache[my_query] = df
    return df
You could clear the cache with run_my_query.func_defaults[0].clear() (not sure I'd recommend this though, just thought it was a fun example).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With