Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hashing a python function to regenerate output when the function is modified

Tags:

I have a python function that has a deterministic result. It takes a long time to run and generates a large output:

def time_consuming_function():
    # lots_of_computing_time to come up with the_result
    return the_result

I modify time_consuming_function from time to time, but I would like to avoid having it run again while it's unchanged. [time_consuming_function only depends on functions that are immutable for the purposes considered here; i.e. it might have functions from Python libraries but not from other pieces of my code that I'd change.] The solution that suggests itself to me is to cache the output and also cache some "hash" of the function. If the hash changes, the function will have been modified, and we have to re-generate the output.

Is this possible or ridiculous?


Updated: based on the answers, it looks like what I want to do is to "memoize" time_consuming_function, except instead of (or in addition to) arguments passed into an invariant function, I want to account for a function that itself will change.

like image 556
Seth Johnson Avatar asked Apr 26 '10 20:04

Seth Johnson


2 Answers

If I understand your problem, I think I'd tackle it like this. It's a touch evil, but I think it's more reliable and on-point than the other solutions I see here.

import inspect
import functools
import json

def memoize_zeroadic_function_to_disk(memo_filename):
    def decorator(f):
        try:
            with open(memo_filename, 'r') as fp:
                cache = json.load(fp)
        except IOError:
            # file doesn't exist yet
            cache = {}

        source = inspect.getsource(f)

        @functools.wraps(f)
        def wrapper():
            if source not in cache:
                cache[source] = f()
                with open(memo_filename, 'w') as fp:
                    json.dump(cache, fp)

            return cache[source]
        return wrapper
    return decorator

@memoize_zeroadic_function_to_disk(...SOME PATH HERE...)
def time_consuming_function():
    # lots_of_computing_time to come up with the_result
    return the_result
like image 90
Mike Graham Avatar answered Oct 15 '22 14:10

Mike Graham


Rather than putting the function in a string, I would put the function in its own file. Call it time_consuming.py, for example. It would look something like this:

def time_consuming_method():
   # your existing method here

# Is the cached data older than this file?
if (not os.path.exists(data_file_name) 
    or os.stat(data_file_name).st_mtime < os.stat(__file__).st_mtime):
    data = time_consuming_method()
    save_data(data_file_name, data)
else:
    data = load_data(data_file_name)

# redefine method
def time_consuming_method():
    return data

While testing the infrastructure for this to work, I'd comment out the slow parts. Make a simple function that just returns 0, get all of the save/load stuff working to your satisfaction, then put the slow bits back in.

like image 26
Daniel Stutzbach Avatar answered Oct 15 '22 14:10

Daniel Stutzbach