I want to cache results of some functions/methods, with these specifications:
I know there are cache decorators for disk-based cache, but their expiration is usually based on time, which is irrelevant to my needs.
I thought about using the Git commit SHA for detecting function/class version, but the problem is that there are multiple functions/classes in the same file. I need a way to check whether the specific function/class segment of the file was changed or not.
I assume the solution will consist of a combination of version managing and caching, but I'm too unfamiliar with the possibilities in order to solve this elegantly.
Example:
#file a.py
@cache_by_version
def f(a,b):
#...
@cache_by_version
def g(a,b):
#...
#file b.py
from a import *
def main():
f(1,2)
Running main
in file b.py
once should result in caching of the result of f
with arguments 1
and 2
to disk. Running main
again should bring the result from the cache without evaluating f(1,2)
again. However, if f
changed, then the cache should be invalid. On the other hand, if g
changed, it should not effect the caching of f
.
Ok, so after a bit of messing around here's something that mostly works:
import os
import hashlib
import pickle
from functools import wraps
import inspect
# just cache in a "cache" directory within current working directory
# also using pickle, but there are other caching libraries out there
# that might be more useful
__cache_dir__ = os.path.join(os.path.abspath(os.getcwd()), 'cache')
def _read_from_cache(cache_key):
cache_file = os.path.join(__cache_dir__, cache_key)
if os.path.exists(cache_file):
with open(cache_file, 'rb') as f:
return pickle.load(f)
return None
def _write_to_cache(cache_key, value):
cache_file = os.path.join(__cache_dir__, cache_key)
if not os.path.exists(__cache_dir__):
os.mkdir(__cache_dir__)
with open(cache_file, 'wb') as f:
pickle.dump(value, f)
def cache_result(fn):
@wraps(fn)
def _decorated(*arg, **kw):
m = hashlib.md5()
fn_src = inspect.getsourcelines(fn)
m.update(str(fn_src))
# generated different key based on arguments too
m.update(str(arg)) # possibly could do better job with arguments
m.update(str(kw))
cache_key = m.hexdigest()
cached = _read_from_cache(cache_key)
if cached is not None:
return cached
value = fn(*arg, **kw)
_write_to_cache(cache_key, value)
return value
return _decorated
@cache_result
def add(a, b):
print "Add called"
return a + b
if __name__ == '__main__':
print add(1, 2)
I've made this use inspect.getsourcelines to read in the functions code and use it to generate the key for looking up in the cache (along with the arguments). This means that any change to the function (even whitespace) will generate a new cache key and the function will need to be called.
Note though, if the function calls other functions and those functions have changed then you will still get the original cached result. Which may be unexpected.
So this is probably ok to use for something that's intensely numerical or involves heavy network activity, but you might find you need to clear the cache directory every now and then.
One downside of using getsourcelines, is that if you don't have access to the source, then this won't work. I guess though for most Python programs this shouldn't be too big a problem.
So I'd take this as a starting point, rather than as a fully working solution.
Also it uses pickle to store the cached value - so it's only safe to use if you can trust that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With