Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problems with the GC when using a WeakValueDictionary for caches

According to the official Python documentation for the weakref module the "primary use for weak references is to implement caches or mappings holding large objects,...". So, I used a WeakValueDictionary to implement a caching mechanism for a long running function. However, as it turned out, values in the cache never stayed there until they would actually be used again, but needed to be recomputed almost every time. Since there were no strong references between accesses to the values stored in the WeakValueDictionary, the GC got rid of them (even though there was absolutely no problem with memory).

Now, how am I then supposed to use the weak reference stuff to implement a cache? If I keep strong references somewhere explicitly to keep the GC from deleting my weak references, there would be no point using a WeakValueDictionary in the first place. There should probably be some option to the GC that tells it: delete everything that has no references at all and everything with weak references only when memory is running out (or some threshold is exceeded). Is there something like that? Or is there a better strategy for this kind of cache?

like image 464
Elmar Zander Avatar asked Mar 13 '12 16:03

Elmar Zander


2 Answers

I'll attempt to answer your inquiry with an example of how to use the weakref module to implement caching. We'll keep our cache's weak references in a weakref.WeakValueDictionary, and the strong references in a collections.deque because it has a maxlen property that controls how many objects it holds on to. Implemented in function closure style:

import weakref, collections
def createLRUCache(factory, maxlen=64):
    weak = weakref.WeakValueDictionary()
    strong = collections.deque(maxlen=maxlen)

    notFound = object()
    def fetch(key):
        value = weak.get(key, notFound)
        if value is notFound:
            weak[key] = value = factory(key)
        strong.append(value)
        return value
    return fetch

The deque object will only keep the last maxlen entries, simply dropping references to the old entries once it reaches capacity. When the old entries are dropped and garbage collected by python, the WeakValueDictionary will remove those keys from the map. Hence, the combination of the two objects helps us keep only maxlen entries in our LRU cache.

class Silly(object):
    def __init__(self, v):
        self.v = v

def fib(i):
    if i > 1:
        return Silly(_fibCache(i-1).v + _fibCache(i-2).v)
    elif i: return Silly(1)
    else: return Silly(0)
_fibCache = createLRUCache(fib)
like image 54
Shane Holloway Avatar answered Sep 28 '22 18:09

Shane Holloway


It looks like there is no way to overcome this limitation, at least in CPython 2.7 and 3.0.

Reflecting on solution createLRUCache():

The solution with createLRUCache(factory, maxlen=64) is not fine with my expectations. The idea of binding to 'maxlen' is something I would like to avoid. It would force me to specify here some non scalable constant or create some heuristic, to decide which constant is better for this or that host memory limits.

I would prefer GC will eliminate unreferenced values from WeakValueDictionary not straight away, but on condition is used for regular GC:

When the number of allocations minus the number of deallocations exceeds threshold0, collection starts.

like image 43
Oleksiy Avatar answered Sep 28 '22 19:09

Oleksiy