Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cachetools for subsequent runs in python

Tags:

python

caching

Am a beginner.Sorry if my question is childish . Do cachetools in python work for susbsequent runs?

import cachetools
import time


@cachetools.cached({})
def find_sum(n):
    time.sleep(5)
    return n+2


print find_sum(2)
print find_sum(3)
print find_sum(2)

So during the first run the third call is faster but the next time i run the file i want the first call to be faster and take result from the cache. Can cachetools do this?

like image 296
Savitha Suresh Avatar asked Jun 21 '18 07:06

Savitha Suresh


People also ask

What is Cachetools?

cachetools is a module that provides various memoizing collections and decorators, including variants of the Python Standard Library's @lru_cache function decorator. Adding a caching behaviour using cachetools is super easy.

How do you maintain cache in Python?

One way to implement an LRU cache in Python is to use a combination of a doubly linked list and a hash map. The head element of the doubly linked list would point to the most recently used entry, and the tail would point to the least recently used entry.

What does @cache mean Python?

Caching is an important concept to understand for every Python programmer. In a nutshell, the concept of caching revolves around utilising programming techniques to store data in a temporary location instead of retrieving it from the source each time.

How do I speed up Python cache?

One technique to make the algorithm more efficient is called memoization. Memoization speeds up the algorithm by storing previously calculated results in a cache. Thus, the function only needs to look up the result of a node without running the computation again.


1 Answers

cachetools cannot do this out of the box. But it's very easy to add.

You can pass any mutable mapping you want to the memoizing decorators. You're using a plain old dict, and dicts are trivial to pickle. And even if you use one of the fancy cache implementations provided by the library, they're all easy to pickle as well.1

So:

import cachetools
import pickle
import time

try:
    with open('mycache.pickle', 'rb') as f:
        cache = pickle.load(f)
except FileNotFoundError:
    cache = {} # or cachetools.LRUCache(maxsize=5) or whatever you want

@cachetools.cached(cache)
def find_sum(n):
    time.sleep(5)
    return n+2

print(find_sum(2))
print(find_sum(3))
print(find_sum(2))

with open('mycache.pickle', 'wb') as f:
    pickle.dump(cache, f)

Of course you can add:

  • A finally or context manager or atexit to make sure you save your files at shutdown even if you hit an exception or ^C.
  • A timer that saves them every so often.
  • A hook on the cache object to save on every update, or every Nth update. (Just override the __setitem__ method—or see Extending cache classes for other things you can do, if you use one of the fancier classes instead of a dict.)

Or you can even use an on-disk key-value database as a cache.

The simplest such database is dbm. It's limited to str/bytes for both keys and values. (You can use shelve if you want non-string values; if you want non-string keys, you probably want a different solution.) So, that doesn't quite work for our example. But for a similar example, it's almost magic:

import dbm
import time
import cachetools

cache = dbm.open('mycache.dbm', 'c')
@cachetools.cached(cache, key=lambda s:s)
def find_string_sum(s):
    time.sleep(5)
    return s + '!'

print(find_string_sum('a'))
print(find_string_sum('b'))
print(find_string_sum('a'))

The only tricky bit was that I had to override the key function. (The default key function handles *args, **kw, so for argument 'a' you end up with something like (('a',), ()), which is obviously not a string.)


1. As you can see from the source, there are bug fixes to make sure all of the classes are picklable in all supported Python versions, so this is clearly intentional.

like image 186
abarnert Avatar answered Sep 20 '22 16:09

abarnert