The following function is meant to be used as a decorator that stores the results of already computed values. If the argument has already been calculated before, the function will return the value stored in the cache
dictionary:
def cached(f):
f.cache = {}
def _cachedf(*args):
if args not in f.cache:
f.cache[args] = f(*args)
return f.cache[args]
return _cachedf
I realized (by mistake) that cache
does not need to be an attribute of the function object. As a matter of facts, the following code works as well:
def cached(f):
cache = {} # <---- not an attribute this time!
def _cachedf(*args):
if args not in cache:
cache[args] = f(*args)
return cache[args]
return _cachedf
I am having a hard time understanding how can the cache
object be persistent across multiple calls. I tried calling multiple cached functions several times and could not find any conflict or problems.
Can anyone please help me understand how the cache
variable still exists even after the _cachedf
function is returned?
You are creating a closure here: The function _cachedf()
closes over the variable cache
from the enclosing scope. This keeps cache
alive as long as the function object lives.
Edit: Maybe I should add a few more details on how this works in Python and how CPython implements this.
Let's look at a simpler example:
def f():
a = []
def g():
a.append(1)
return len(a)
return g
Example usage in the interactive interpreter
>>> h = f()
>>> h()
1
>>> h()
2
>>> h()
3
During compilation of the module containing the function f()
, the
compiler sees that the function g()
references the name a
from the
enclosing scope and memorises this external reference in the code
object corresponding to the function f()
(specifically, it adds the
name a
to f.__code__.co_cellvars
).
So what happens when the function f()
is called? The first line
create a new list object and binds it to the name a
. The next line
creates a new function object (using a the code object created during
the compilation of the module) and binds it to the name g
. The body
of g()
isn't executed at this point, and finally the funciton object
is returned.
Since the code object of f()
has a note that the name a
is
referenced by local functions, a "cell" for this name is created when
f()
is entered. This cell contains the reference to the actual list
object a
is bound to, and the function g()
gets a reference to
this cell. That way, the list object and the cell are kept alive even
when the funciton f()
exits.
Can anyone please help me understand how the cache variable still exists even after the _cachedf function is returned?
It has to do with Python's reference counting garbage collector. The cache
variable will be conserved and accessible since the function _cachedf
has a reference to it, and the caller to cached
has a reference to that. When you call the function again, you are still using the same function object that was originally created, hence you still have access to the cache.
You won't lose the cache until all references to it are destroyed. You can use the del
operator to do that.
For example:
>>> import time
>>> def cached(f):
... cache = {} # <---- not an attribute this time!
... def _cachedf(*args):
... if args not in cache:
... cache[args] = f(*args)
... return cache[args]
... return _cachedf
...
...
>>> def foo(duration):
... time.sleep(duration)
... return True
...
...
>>> bob = cached(foo)
>>> bob(2) # Takes two seconds
True
>>> bob(2) # returns instantly
True
>>> del bob # Deletes reference to bob (aka _cachedf) which holds ref to cache
>>> bob = cached(foo)
>>> bob(2) # takes two seconds
True
>>>
For the record, what you're trying to acheive is called Memoization, and there is a more complete memoizing decorator available from the decorator pattern page which does the same thing, but using a decorator class. Your code and the class-based decorator are essentially the same, with the class-based decorator checking for hash-ability before storing.
Edit (2017-02-02): @SiminJie comments that cached(foo)(2)
always incurs a delay.
This is because cached(foo)
returns a new function with a fresh cache. When cached(foo)(2)
is called, a new fresh (empty) cache is created and then the cached function is immediately called.
Since the cache is empty and won't find the value, it re-runs the underlying function. Instead, do cached_foo = cached(foo)
and then call cached_foo(2)
multiple times. This will only incur the delay for the first call. Also, if used as a decorator, it will work as expected:
@cached
def my_long_function(arg1, arg2):
return long_operation(arg1,arg2)
my_long_function(1,2) # incurs delay
my_long_function(1,2) # doesn't
If you're not familiar with decorators, take a look at this answer to understand what the above code means.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With