Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a pythonic way to support keyword arguments for a memoize decorator in Python?

So I recently asked a question about memoization and got some great answers, and now I want to take it to the next level. After quite a bit of googling, I could not find a reference implementation of a memoize decorator that was able to cache a function that took keyword arguments. In fact, most of them simply used *args as the key for cache lookups, meaning that it would also break if you wanted to memoize a function that accepted lists or dicts as arguments.

In my case, the first argument to the function is a unique identifier by itself, suitable for use as a dict key for cache lookups, however I wanted the ability to use keyword arguments and still access the same cache. What I mean is, my_func('unique_id', 10) and my_func(foo=10, func_id='unique_id') should both return the same cached result.

In order to do this, what we need is a clean and pythonic way of saying 'inspect kwargs for whichever keyword it is that corresponds to the first argument)'. This is what I came up with:

class memoize(object):
    def __init__(self, cls):
        if type(cls) is FunctionType:
            # Let's just pretend that the function you gave us is a class.
            cls.instances = {}
            cls.__init__ = cls
        self.cls = cls
        self.__dict__.update(cls.__dict__)

    def __call__(self, *args, **kwargs):
        """Return a cached instance of the appropriate class if it exists."""
        # This is some dark magic we're using here, but it's how we discover
        # that the first argument to Photograph.__init__ is 'filename', but the
        # first argument to Camera.__init__ is 'camera_id' in a general way.
        delta = 2 if type(self.cls) is FunctionType else 1
        first_keyword_arg = [k
            for k, v in inspect.getcallargs(
                self.cls.__init__,
                'self',
                'first argument',
                *['subsequent args'] * (len(args) + len(kwargs) - delta)).items()
                    if v == 'first argument'][0]
        key = kwargs.get(first_keyword_arg) or args[0]
        print key
        if key not in self.cls.instances:
            self.cls.instances[key] = self.cls(*args, **kwargs)
        return self.cls.instances[key]

The crazy thing is that this actually works. Eg, if you decorate like this:

@memoize
class FooBar:
    instances = {}

    def __init__(self, unique_id, irrelevant=None):
        print id(self)

Then from your code you can call either FooBar('12345', 20) or FooBar(irrelevant=20, unique_id='12345') and actually get the same instance of FooBar. You can then define a different class with a different name for the first argument, because it works in a general way (ie, the decorator doesn't need to know anything specfic about the class it's decorating in order for this to work).

The problem is, it's an ungodly mess ;-)

It works because inspect.getcallargs returns a dict mapping the defined keywords to the arguments you supply it, so I supply it some phony arguments and then inspect the dict for the first argument that got passed.

What would be much nicer, if such a thing were to even exist, is an analogue to inspect.getcallargs that returned both kinds of arguments unified as a list of the arguments instead of as a dict of the keyword arguments. That would allow something like this:

def __call__(self, *args, **kwargs):
    key = inspect.getcallargsaslist(self.cls.__init__, None, *args, **kwargs)[1]
    if key not in self.cls.instances:
        self.cls.instances[key] = self.cls(*args, **kwargs)
    return self.cls.instances[key]

The other way I can see of tackling this would be using the dict provided by inspect.getcallargs as the lookup cache key directly, but that would require a repeatable way to make identical strings from identical hashes, which is something I've heard can't be relied upon (I guess i'd have to construct the string myself after sorting the keys).

Does anybody have any thoughts on this? Is it Wrong to want to call a function with keyword arguments and cache the results? Or just Very Difficult?

like image 239
robru Avatar asked Jun 06 '12 18:06

robru


People also ask

How does memoize work in Python?

Memoization allows you to optimize a Python function by caching its output based on the parameters you supply to it. Once you memoize a function, it will only compute its output once for each set of parameters you call it with. Every call after the first will be quickly retrieved from a cache.

What is memoization and how can I use it in Python?

Memoization is an efficient software optimization technique used to speed up programs. It allows you to optimize a python function by catching its output based on the supplied input parameters. Memoization ensures that a method runs for the same input only once.

How does cache decorator work?

cache is a decorator that helps in reducing function execution for the same inputs using the memoization technique. The function returns the same value as lru_cache(maxsize=None) , where the cache grows indefinitely without evicting old values.

How do you cache a variable in Python?

Implementing a Cache Using a Python Dictionary You can use the article's URL as the key and its content as the value. Save this code to a caching.py file, install the requests library, then run the script: $ pip install requests $ python caching.py Getting article... Fetching article from server...


2 Answers

Try lru_cache:

@functools.lru_cache(maxsize=128, typed=False)

Decorator to wrap a function with a memoizing callable that saves up to the maxsize most recent calls. It can save time when an expensive or I/O bound function is periodically called with the same arguments.

lru_cache added in python 3.2, but can be backported into 2.x

like image 113
Aleksei astynax Pirogov Avatar answered Nov 09 '22 23:11

Aleksei astynax Pirogov


I'd suggest something like the following:

import inspect

class key_memoized(object):
    def __init__(self, func):
       self.func = func
       self.cache = {}

    def __call__(self, *args, **kwargs):
        key = self.key(args, kwargs)
        if key not in self.cache:
            self.cache[key] = self.func(*args, **kwargs)
        return self.cache[key]

    def normalize_args(self, args, kwargs):
        spec = inspect.getargs(self.func.__code__).args
        return dict(kwargs.items() + zip(spec, args))

    def key(self, args, kwargs):
        a = self.normalize_args(args, kwargs)
        return tuple(sorted(a.items()))

Example:

@key_memoized
def foo(bar, baz, spam):
    print 'calling foo: bar=%r baz=%r spam=%r' % (bar, baz, spam)
    return bar + baz + spam

print foo(1, 2, 3)
print foo(1, 2, spam=3)         #memoized
print foo(spam=3, baz=2, bar=1) #memoized

Note that you can also extend key_memoized and override its key() method to provide more specific memoization strategies, e.g. to ignore some of the arguments:

class memoize_by_bar(key_memoized):
    def key(self, args, kwargs):
        return self.normalize_args(args, kwargs)['bar']

@memoize_by_bar
def foo(bar, baz, spam):
    print 'calling foo: bar=%r baz=%r spam=%r' % (bar, baz, spam)
    return bar

print foo('x', 'ignore1', 'ignore2')
print foo('x', 'ignore3', 'ignore4')
like image 38
georg Avatar answered Nov 10 '22 00:11

georg