Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure of memoization in db

What is the best data structure to cache (save/store/memorize) so many function result in database. Suppose function calc_regress with flowing definition in python:

def calc_regress(ind_key, dep_key, count=30):
    independent_list = sql_select_recent_values(count, ind_key)
    dependant_list = sql_select_recent_values(count, dep_key)
    import scipy.stats as st
    return st.linregress(independent_list, dependant_list)

I see answers to What kind of table structure should be used to store memoized function parameters and results in a relational database? but it seem to resolve problem of just one function while I have about 500 function.

like image 937
mohsen Lzd Avatar asked Oct 30 '22 16:10

mohsen Lzd


2 Answers

Option A

You could use the structure in the linked answer, un-normalized with the number of columns = max number of arguments among the 500 functions. Also need to add a column for the function name.

Then you could do a SELECT * FROM expensive_func_results WHERE func_name = 'calc_regress' AND arg1 = ind_key AND arg2 = dep_key and arg3 = count, etc.

Ofcourse, that's not a very good design to use. For the same function called with fewer parameters, columns with null values/non-matches need to be ignored; otherwise you'll get multiple result rows.

Option B

Create the table/structure as func_name, arguments, result where 'arguments' is always a kwargs dictionary or positional args but not mixed per entry. Even with the kwargs dict stored as a string, order of keys->values in it is not predictable/consistent even if it's the same args. So you'll need to order it before converting to a string and storing it. When you want to query, you'll use SELECT * FROM expensive_func_results WHERE func_name = 'calc_regress' AND arguments = 'str(kwargs_dict)', where str(kwargs_dict) is something you'll set programmatically. It could also be set to the result of inspect.getargspec, (or inspect.getcallargs) though you'll have to check for consistency.

You won't be able to do queries on the argument combos unless you provide all the arguments to the query or partial match with LIKE.

Option C

Normalised all the way: One table func_calls as func_name, args_combo_id, arg_name_idx, arg_value. Each row of the table will store one arg for one combo of that function's calling args. Another table func_results as func_name, args_combo_id, result. You could also normalise further for func_name to be mapped to a func_id.

In this one, the order of keyword args doesn't matter since you'll be doing an Inner join to select each parameter. This query will have to be built programmatically or done via a stored procedure, since the number of joins required to fetch all the parameters is determined by the number of parameters. Your function above has 3 params but you may have another with 10. arg_name_idx is 'argument name or index' so it also works for mixed kwargs + args. Some duplication may occur in cases like calc_regress(ind_key=1, dep_key=2, count=30) and calc_regress(1, 2, 30) (as well as calc_regress(1, 2) with a default value for count <-- this cases should be avoided, the table entry should have all args); since the args_combo_id will be different for both but result will obviously be the same. Again, the inspect module may help in this area.


[Edit] PS: Additionally, for the func_name, you may need to use a fully qualified name to avoid conflicts across modules in your package. And decorators may interfere with that as well; without a deco.__name__ = func.__name__, etc.

PPS: If objects are being passed to functions being memoized in the db, make sure that their __str__ is something useful & repeatable/consistent to store as arg values.

This particular case doesn't require you to re-create objects from the arg values in the db, otherwise, you'd need to make __str__ or __repr__ like the way __repr__ was intended to be (but isn't generally done):

this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment).

like image 119
aneroid Avatar answered Nov 15 '22 04:11

aneroid


I'd use a key value storage here, where the key could be a concatenation of the id of the function object (to guarantee the key uniqness) and its arguments while the value would be the function returned value.

So calc_regress(1, 5, 30) call would produce an example key 139694472779248_1_5_30 where the first part is id(calc_regress). An example key producing function:

>>> def produce_cache_key(fun, *args, **kwargs):
...     args_key = '_'.join(str(a) for a in args)
...     kwargs_key = '_'.join('%s%s' % (k, v) for k, v in kwargs.items())
...     return '%s_%s_%s' % (id(fun), args_key, kwargs_key)

You could keep your results in memory using a dictionary and a decorator:

>>> def cache_result(cache):
...     def decorator(fun):
...         def wrapper(*args, **kwargs):
...             key = produce_cache_key(fun, *args, **kwargs)
...             if key not in cache:
...                 cache[key] = fun(*args, **kwargs)
...             return cache[key]
...         return wrapper
...     return decorator
... 
>>> 
>>> @cache_result(cache_dict)
... def fx(x, y, z=0):
...     print 'Doing some expensive job...'
... 
>>> cache = {}
>>> fx(1, 2, z=1)
Doing some expensive job...
>>> fx(1, 2, z=1)
>>> 
like image 22
matino Avatar answered Nov 15 '22 03:11

matino