Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can data remain persistent across multiple calls of decorated function?

The following function is meant to be used as a decorator that stores the results of already computed values. If the argument has already been calculated before, the function will return the value stored in the cache dictionary:

def cached(f):
    f.cache = {}
    def _cachedf(*args):
        if args not in f.cache:
            f.cache[args] = f(*args)

        return f.cache[args]

    return _cachedf

I realized (by mistake) that cache does not need to be an attribute of the function object. As a matter of facts, the following code works as well:

def cached(f):
    cache = {}   # <---- not an attribute this time!
    def _cachedf(*args):
        if args not in cache:
            cache[args] = f(*args)

        return cache[args]
    return _cachedf

I am having a hard time understanding how can the cache object be persistent across multiple calls. I tried calling multiple cached functions several times and could not find any conflict or problems.

Can anyone please help me understand how the cache variable still exists even after the _cachedf function is returned?

like image 216
rahmu Avatar asked Aug 06 '12 14:08

rahmu


2 Answers

You are creating a closure here: The function _cachedf() closes over the variable cache from the enclosing scope. This keeps cache alive as long as the function object lives.

Edit: Maybe I should add a few more details on how this works in Python and how CPython implements this.

Let's look at a simpler example:

def f():
    a = []
    def g():
        a.append(1)
        return len(a)
    return g

Example usage in the interactive interpreter

>>> h = f()
>>> h()
1
>>> h()
2
>>> h()
3

During compilation of the module containing the function f(), the compiler sees that the function g() references the name a from the enclosing scope and memorises this external reference in the code object corresponding to the function f() (specifically, it adds the name a to f.__code__.co_cellvars).

So what happens when the function f() is called? The first line create a new list object and binds it to the name a. The next line creates a new function object (using a the code object created during the compilation of the module) and binds it to the name g. The body of g() isn't executed at this point, and finally the funciton object is returned.

Since the code object of f() has a note that the name a is referenced by local functions, a "cell" for this name is created when f() is entered. This cell contains the reference to the actual list object a is bound to, and the function g() gets a reference to this cell. That way, the list object and the cell are kept alive even when the funciton f() exits.

like image 135
Sven Marnach Avatar answered Oct 26 '22 16:10

Sven Marnach


Can anyone please help me understand how the cache variable still exists even after the _cachedf function is returned?

It has to do with Python's reference counting garbage collector. The cache variable will be conserved and accessible since the function _cachedf has a reference to it, and the caller to cached has a reference to that. When you call the function again, you are still using the same function object that was originally created, hence you still have access to the cache.

You won't lose the cache until all references to it are destroyed. You can use the del operator to do that.

For example:

>>> import time
>>> def cached(f):
...     cache = {}   # <---- not an attribute this time!
...     def _cachedf(*args):
...         if args not in cache:
...             cache[args] = f(*args)
...         return cache[args]
...     return _cachedf
...     
... 
>>> def foo(duration):
...     time.sleep(duration)
...     return True
...     
... 
>>> bob = cached(foo)
>>> bob(2) # Takes two seconds
True
>>> bob(2) # returns instantly
True
>>> del bob # Deletes reference to bob (aka _cachedf) which holds ref to cache
>>> bob = cached(foo)
>>> bob(2) # takes two seconds
True
>>> 

For the record, what you're trying to acheive is called Memoization, and there is a more complete memoizing decorator available from the decorator pattern page which does the same thing, but using a decorator class. Your code and the class-based decorator are essentially the same, with the class-based decorator checking for hash-ability before storing.


Edit (2017-02-02): @SiminJie comments that cached(foo)(2) always incurs a delay.

This is because cached(foo) returns a new function with a fresh cache. When cached(foo)(2) is called, a new fresh (empty) cache is created and then the cached function is immediately called.

Since the cache is empty and won't find the value, it re-runs the underlying function. Instead, do cached_foo = cached(foo) and then call cached_foo(2) multiple times. This will only incur the delay for the first call. Also, if used as a decorator, it will work as expected:

@cached
def my_long_function(arg1, arg2):
  return long_operation(arg1,arg2)

my_long_function(1,2) # incurs delay
my_long_function(1,2) # doesn't

If you're not familiar with decorators, take a look at this answer to understand what the above code means.

like image 42
brice Avatar answered Oct 26 '22 14:10

brice