According to the official Python documentation for the weakref module the "primary use for weak references is to implement caches or mappings holding large objects,...". So, I used a WeakValueDictionary to implement a caching mechanism for a long running function. However, as it turned out, values in the cache never stayed there until they would actually be used again, but needed to be recomputed almost every time. Since there were no strong references between accesses to the values stored in the WeakValueDictionary, the GC got rid of them (even though there was absolutely no problem with memory).
Now, how am I then supposed to use the weak reference stuff to implement a cache? If I keep strong references somewhere explicitly to keep the GC from deleting my weak references, there would be no point using a WeakValueDictionary in the first place. There should probably be some option to the GC that tells it: delete everything that has no references at all and everything with weak references only when memory is running out (or some threshold is exceeded). Is there something like that? Or is there a better strategy for this kind of cache?
I'll attempt to answer your inquiry with an example of how to use the weakref
module to implement caching. We'll keep our cache's weak references in a weakref.WeakValueDictionary
, and the strong references in a collections.deque
because it has a maxlen
property that controls how many objects it holds on to. Implemented in function closure style:
import weakref, collections
def createLRUCache(factory, maxlen=64):
weak = weakref.WeakValueDictionary()
strong = collections.deque(maxlen=maxlen)
notFound = object()
def fetch(key):
value = weak.get(key, notFound)
if value is notFound:
weak[key] = value = factory(key)
strong.append(value)
return value
return fetch
The deque
object will only keep the last maxlen
entries, simply dropping references to the old entries once it reaches capacity. When the old entries are dropped and garbage collected by python, the WeakValueDictionary
will remove those keys from the map. Hence, the combination of the two objects helps us keep only maxlen
entries in our LRU cache.
class Silly(object):
def __init__(self, v):
self.v = v
def fib(i):
if i > 1:
return Silly(_fibCache(i-1).v + _fibCache(i-2).v)
elif i: return Silly(1)
else: return Silly(0)
_fibCache = createLRUCache(fib)
It looks like there is no way to overcome this limitation, at least in CPython 2.7 and 3.0.
Reflecting on solution createLRUCache():
The solution with createLRUCache(factory, maxlen=64) is not fine with my expectations. The idea of binding to 'maxlen' is something I would like to avoid. It would force me to specify here some non scalable constant or create some heuristic, to decide which constant is better for this or that host memory limits.
I would prefer GC will eliminate unreferenced values from WeakValueDictionary not straight away, but on condition is used for regular GC:
When the number of allocations minus the number of deallocations exceeds threshold0, collection starts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With