Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does python lru_cache performs best when maxsize is a power-of-two?

Documentation says this:

If maxsize is set to None, the LRU feature is disabled and the cache can grow without bound. The LRU feature performs best when maxsize is a power-of-two.

Would anyone happen to know where does this "power-of-two" come from? I am guessing it has something to do with the implementation.

like image 999
Mr Matrix Avatar asked Dec 23 '22 18:12

Mr Matrix


1 Answers

Where the size effect arises

The lru_cache() code exercises its underlying dictionary in an atypical way. While maintaining total constant size, cache misses delete the oldest item and insert a new item.

The power-of-two suggestion is an artifact of how this delete-and-insert pattern interacts with the underlying dictionary implementation.

How dictionaries work

  • Table sizes are a power of two.
  • Deleted keys are replaced with dummy entries.
  • New keys can sometimes reuse the dummy slot, sometimes not.
  • Repeated delete-and-inserts with different keys will fill-up the table with dummy entries.
  • An O(N) resize operation runs when the table is two-thirds full.
  • Since the number of active entries remains constant, a resize operation doesn't actually change the table size.
  • The only effect of the resize is to clear the accumulated dummy entries.

Performance implications

A dict with 2**n entries has the most available space for dummy entries, so the O(n) resizes occur less often.

Also, sparse dictionaries have fewer hash collisions than mostly full dictionaries. Collisions degrade dictionary performance.

When it matters

The lru_cache() only updates the dictionary when there is a cache miss. Also, when there is a miss, the wrapped function is called. So, the effect of resizes would only matter if there are high proportion of misses and if the wrapped function is very cheap.

Far more important than giving the maxsize a power-of-two is using the largest reasonable maxsize. Bigger caches have more cache hits — that's where the big wins come from.

Simulation

Once an lru_cache() is full and the first resize has occurred, the dictionary settles into a steady state and will never get larger. Here, we simulate what happens next as new dummy entries are added and periodic resizes clear them away.

steady_state_dict_size = 2 ** 7     # always a power of two

def simulate_lru_cache(lru_maxsize, events=1_000_000):
    'Count resize operations as dummy keys are added'
    resize_point = steady_state_dict_size * 2 // 3
    assert lru_maxsize < resize_point
    dummies = 0
    resizes = 0
    for i in range(events):
        dummies += 1
        filled = lru_maxsize + dummies
        if filled >= resize_point:
           dummies = 0
           resizes += 1
    work = resizes * lru_maxsize    # resizing is O(n)
    work_per_event = work / events
    print(lru_maxsize, '-->', resizes, work_per_event)

Here is an excerpt of the output:

for maxsize in range(42, 85):
    simulate_lru_cache(maxsize)

42 --> 23255 0.97671
43 --> 23809 1.023787
44 --> 24390 1.07316
45 --> 25000 1.125
46 --> 25641 1.179486
  ...
80 --> 200000 16.0
81 --> 250000 20.25
82 --> 333333 27.333306
83 --> 500000 41.5
84 --> 1000000 84.0

What this shows is that the cache does significantly less work when maxsize is as far away as possible from the resize_point.

History

The effect was minimal in Python3.2, when dictionaries grew by 4 x active_entries when resizing.

The effect became catastrophic when the growth rate was lowered to 2 x active entries.

Later a compromise was reached, setting the growth rate to 3 x used. That significantly mitigated the issue by giving us a larger steady state size by default.

A power-of-two maxsize is still the optimum setting, giving the least work for a given steady state dictionary size, but it no longer matters as much as it did in Python3.2.

Hope this helps clear up your understanding. :-)

like image 143
Raymond Hettinger Avatar answered Feb 16 '23 00:02

Raymond Hettinger