Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I create a bounded memoization decorator in Python?

Obviously, a quick search yields a million implementations and flavors of the memoization decorator in Python. However, I am interested in a flavor that I haven't been able to find. I would like to have it such that the cache of stored values can be of a fixed capacity. When new elements are added, if the capacity is reached, then the oldest value is removed and is replaced with the newest value.

My concern is that, if I use memoization to store a great many elements, then the program will crash because of a lack of memory. (I don't know how well-placed this concern may be in practice.) If the cache were of a fixed size, then a memory error would not be an issue. And many problems that I work on change as the program executes so that initial cached values would look very different from later cached values (and would be much less likely to recur later). That's why I'd like the oldest stuff to be replaced by the newest stuff.

I found the OrderedDict class and an example showing how to subclass it to specify a maximum size. I'd like to use that as my cache, rather than a normal dict. The problem is, I need the memoize decorator to take a parameter called maxlen that defaults to None. If it is None, then the cache is boundless and operates as normal. Any other value is used as the size for the cache.

I want it to work like the following:

@memoize
def some_function(spam, eggs):
    # This would use the boundless cache.
    pass

and

@memoize(200)  # or @memoize(maxlen=200)
def some_function(spam, eggs):
    # This would use the bounded cache of size 200.
    pass

Below is the code that I have so far, but I don't see how to pass the parameter into the decorator while making it work both "naked" and with a parameter.

import collections
import functools

class BoundedOrderedDict(collections.OrderedDict):
    def __init__(self, *args, **kwds):
        self.maxlen = kwds.pop("maxlen", None)
        collections.OrderedDict.__init__(self, *args, **kwds)
        self._checklen()

    def __setitem__(self, key, value):
        collections.OrderedDict.__setitem__(self, key, value)
        self._checklen()

    def _checklen(self):
        if self.maxlen is not None:
            while len(self) > self.maxlen:
                self.popitem(last=False)

def memoize(function):
    cache = BoundedOrderedDict()  # I want this to take maxlen as an argument
    @functools.wraps(function)
    def memo_target(*args):
        lookup_value = args
        if lookup_value not in cache:
            cache[lookup_value] = function(*args)
        return cache[lookup_value]
    return memo_target

@memoize
def fib(n):
    if n < 2: return 1
    return fib(n-1) + fib(n-2)

if __name__ == '__main__':
    x = fib(50)
    print(x)

Edit: Using Ben's suggestion, I created the following decorator, which I believe works the way I imagined. It's important to me to be able to use these decorated functions with multiprocessing, and that has been an issue in the past. But a quick test of this code seemed to work correctly, even when farming out the jobs to a pool of threads.

def memoize(func=None, maxlen=None):
    if func:
        cache = BoundedOrderedDict(maxlen=maxlen)
        @functools.wraps(func)
        def memo_target(*args):
            lookup_value = args
            if lookup_value not in cache:
                cache[lookup_value] = func(*args)
            return cache[lookup_value]
        return memo_target
    else:
        def memoize_factory(func):
            return memoize(func, maxlen=maxlen)
        return memoize_factory
like image 589
agarrett Avatar asked Feb 22 '12 04:02

agarrett


People also ask

How do I create a custom decorator in Python?

To create a decorator function in Python, I create an outer function that takes a function as an argument. There is also an inner function that wraps around the decorated function. To use a decorator ,you attach it to a function like you see in the code below.

What is Memoization and how can I use it in Python?

Memoization is a method used to store the results of previous function calls to speed up future calculations. If repeated function calls are made with the same parameters, we can store the previous values instead of repeating unnecessary calculations. This results in a significant speed up in calculations.


1 Answers

@memoize
def some_function(spam, eggs):
    # This would use the boundless cache.
    pass

Here memoize is used as a function that is called on a single function argument, and returns a function. memoize is a decorator.

@memoize(200)  # or @memoize(maxlen=200)
def some_function(spam, eggs):
    # This would use the bounded cache of size 200.
    pass

Here memoize is used as a function that is called on a single integer argument and returns a function, and that returned function is itself used as a decorator i.e. it is called on a single function argument and returns a function. memoize is a decorator factory.

So to unify these two, you're going to have to write some ugly code. The way I would probably do it is to have memoize look like this:

def memoize(func=None, maxlen=None):
    if func:
        # act as decorator
    else:
        # act as decorator factory

This way if you want to pass parameters you always pass them as keyword arguments, leaving func (which should be a positional parameter) unset, and if you just want everything to default it will magically work as a decorator directly. This does mean @memoize(200) will give you an error; you could avoid that by instead doing some type checking to see whether func is callable, which should work well in practice but isn't really very "pythonic".

An alternative would be to have two different decorators, say memoize and bounded_memoize. The unbounded memoize can have a trivial implementation by just calling bounded_memoize with maxlen set to None, so it doesn't cost you anything in implementation or maintenance.

Normally as a rule of thumb I try to avoid mangling a function to implement two only-tangentially related sets of functionality, especially when they have such different signatures. But in this case it does make the use of the decorator is natural (requiring @memoize() would be quite error prone, even though it's more consistent from a theoretical perspective), and you're presumably going to implement this once and use it many times, so readibility at point of use is probably the more important concern.

like image 153
Ben Avatar answered Oct 10 '22 10:10

Ben