Python re module's cache clearing

Question

While reading the documentation on Python re module I decided to have a look on re.py source code.

When I opened it, I found this:

_cache = {}
_MAXCACHE = 100

def _compile(*key):
    cachekey = (type(key[0]),) + key
    p = _cache.get(cachekey)
    if p is not None:
        return p

    #...Here I skip some part of irrelevant to the question code...

    if len(_cache) >= _MAXCACHE:
        _cache.clear()
    _cache[cachekey] = p
    return p

Why is the cache cleared using_cache.clear() when it reaches _MAXCACHE of entries?

Is it common approach to clear cache completely and start from scratch?

Why just not used the longest time ago cashed value is deleted?

Andrew Clark · Accepted Answer

The point of caching is to decrease the average call time of the function. The overhead associated with keeping more information in _cache and pruning it instead of clearing it would increase that average call time. The _cache.clear() call will complete quickly, and even though you lose your cache this is preferable to maintaining a cache state and having the overhead of removing individual elements from the cache when the limit is reached.

There are a few things to think about when calculating the cache efficiency:

Average call time on cache hits (very short)
Average call time on cache misses (longer)
Frequency of cache hits (fairly uncommon)
Call time when cache is cleared or pruned (fairly uncommon)

The question is does increasing #3 make sense if it means increasing #2 and #4 as well. My guess is that it doesn't, or the difference is negligible enough that keeping the code simple is preferable.

g.d.d.c · Answer

If I had to guess I'd say that it was done this way to avoid having to keep track of when / how long individual values had been stored in the cache, which would create both memory and processing overhead. Because the caching object being used is a dictionary, which is inherently unordered, there's no good way to know what order items were added to it without some other caching object as well. This could be addressed by using an OrderedDict in place of a standard dictionary, assuming you're working with Python >= 2.7, but otherwise, you'd need to significantly redesign the way the caching was implemented in order to eliminate the need for a clear().

Python re module's cache clearing

Tags:

python

regex

caching

ovgolovin

2 Answers

Andrew Clark

g.d.d.c

Recent Activity

Donate For Us

Python re module's cache clearing

Tags:

python

regex

caching

ovgolovin

2 Answers

Andrew Clark

g.d.d.c

Related questions

Recent Activity

Donate For Us