How to get functools.lru_cache to return new instances?

Tags:

I use Python's lru_cache on a function which returns a mutable object, like so:

import functools

@functools.lru_cache()
def f():
    x = [0, 1, 2]  # Stand-in for some long computation
    return x

If I call this function, mutate the result and call it again, I do not obtain a "fresh", unmutated object:

a = f()
a.append(3)
b = f()
print(a)  # [0, 1, 2, 3]
print(b)  # [0, 1, 2, 3]

I get why this happens, but it's not what I want. A fix would be to leave the caller in charge of using list.copy:

a = f().copy()
a.append(3)
b = f().copy()
print(a)  # [0, 1, 2, 3]
print(b)  # [0, 1, 2]

However I would like to fix this inside f. A pretty solution would be something like

@functools.lru_cache(copy=True)
def f():
    ...

though no copy argument is actually taken by functools.lru_cache.

Any suggestion as to how to best implement this behavior?

Edit

Based on the answer from holdenweb, this is my final implementation. It behaves exactly like the builtin functools.lru_cache by default, and extends it with the copying behavior when copy=True is supplied.

import functools
from copy import deepcopy

def lru_cache(maxsize=128, typed=False, copy=False):
    if not copy:
        return functools.lru_cache(maxsize, typed)
    def decorator(f):
        cached_func = functools.lru_cache(maxsize, typed)(f)
        @functools.wraps(f)
        def wrapper(*args, **kwargs):
            return deepcopy(cached_func(*args, **kwargs))
        return wrapper
    return decorator

# Tests below

@lru_cache()
def f():
    x = [0, 1, 2]  # Stand-in for some long computation
    return x

a = f()
a.append(3)
b = f()
print(a)  # [0, 1, 2, 3]
print(b)  # [0, 1, 2, 3]

@lru_cache(copy=True)
def f():
    x = [0, 1, 2]  # Stand-in for some long computation
    return x

a = f()
a.append(3)
b = f()
print(a)  # [0, 1, 2, 3]
print(b)  # [0, 1, 2]

552

asked Feb 27 '19 15:02

jmd_dk

1 Answers

Since the lru_cache decorator has unsuitable behaviour for you, the best you can do is to build your own decorator that returns a copy of what it gets from lru_cache. This will mean that the first call with a particular set of arguments will create two copies of the object, since now the cache will only be holding prototype objects.

This question is made more difficult because lru_cache can take arguments (mazsize and typed), so a call to lru_cache returns a decorator. Remembering that a decorator takes a function as its argument and (usually) returns a function, you will have to replace lru_cache with a function that takes two arguments and returns a function that takes a function as an argument and returns a (wrapped) function which is not an easy structure to wrap your head around.

You would then write your functions using the copying_lru_cache decorator instead of the standard one, which is now applied "manually" inside the updated decorator.

Depending on how heavy the mutations are, you might get away without using deepcopy, but you don't give enough information to determine that.

So your code would then read

from functools import lru_cache
from copy import deepcopy

def copying_lru_cache(maxsize=10, typed=False):
    def decorator(f):
        cached_func = lru_cache(maxsize=maxsize, typed=typed)(f)
        def wrapper(*args, **kwargs):
            return deepcopy(cached_func(*args, **kwargs))
        return wrapper
    return decorator

@copying_lru_cache()
def f(arg):
    print(f"Called with {arg}")
    x = [0, 1, arg]  # Stand-in for some long computation
    return x

print(f(1), f(2), f(3), f(1))

This prints

Called with 1
Called with 2
Called with 3
[0, 1, 1] [0, 1, 2] [0, 1, 3] [0, 1, 1]

so the cacheing behaviour your require appears to be present. Note also tht the documentation for lru_cache specifically warns that

In general, the LRU cache should only be used when you want to reuse previously computed values. Accordingly, it doesn’t make sense to cache functions with side-effects, functions that need to create distinct mutable objects on each call, or impure functions such as time() or random().

156

answered Sep 18 '22 03:09

holdenweb

Related questions
                            
                                Spark/PySpark: An error occurred while trying to connect to the Java server (127.0.0.1:39543)
                            
                                Writing results from SQL query to CSV and avoiding extra line-breaks
                            
                                Selecting an element on Appium / Android with Python that has same Class and Same Index of another element on UIAutomatorViewer
                            
                                Django app : unit tests fails because of django.db.utils.IntegrityError
                            
                                How to get the co-ordinates of the text recogonized from Image using OCR in python
                            
                                Adding Tensorboard summaries from graph ops generated inside Dataset map() function calls
                            
                                How to upgrade django project multiple versions (1.8 to 1.11+)?
                            
                                Unable to convert Kafka topic data into structured JSON with Confluent Elasticsearch sink connector
                            
                                Does the TensorFlow backend of Keras rely on the eager execution?
                            
                                Storing multiple dataframes of different widths with Parquet?
                            
                                Jupyter commands work only with a dash (e.g. jupyter-kernelspec instead of jupyter kernelspec)
                            
                                Groupby search first and last True values
                            
                                TensorFlow tf.data.Dataset and bucketing
                            
                                requirements.txt - How to mark alternative packages
                            
                                Python Click: Multiple Key Value Pair Arguments
                            
                                Running/Debugging Pycharm Python Scripts with remote Docker Machine
                            
                                How to do a polynomial fit with fixed points in 3D
                            
                                Jinja2 check if value exists in list of dictionaries
                            
                                How to solve "Error connecting to SMTP host: [Errno 10061] No connection could be made because the target machine actively refused it''?
                            
                                Implementing an “infinite loop” Dataset & DataLoader in PyTorch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get functools.lru_cache to return new instances?

Tags:

python

python-3.x

caching

mutable

Edit

jmd_dk

People also ask

1 Answers

holdenweb

Recent Activity

Donate For Us