Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python's new `functools.cached_property` bug or limitation?

Since Python 3.8, functools has a cached_property. I've been using a similar lazyprop decorator based on Beazley's cookbook (code below), but when I replace by the builtin, I get problems. Here's one of them.

When I use the decorator within the class definition, using the @ operator, it doesn't complain.

But if I use it with setattr, I get:

TypeError: Cannot use cached_property instance without calling __set_name__ on it.

Beazley's simple version works fine though.

from functools import cached_property

class lazyprop:
    """Based on code from David Beazley's "Python Cookbook"."""
    def __init__(self, func):
        self.__doc__ = getattr(func, '__doc__')
        self.func = func

    def __get__(self, instance, cls):
        if instance is None:
            return self
        else:
            value = instance.__dict__[self.func.__name__] = self.func(instance)
            return value

class D:
    def __init__(self, greeting='Hello'):
        self.greeting = greeting

def greet(self):
    return self.greeting + " world!"


# Beazley's version works...
D.greet = lazyprop(greet)
assert D().greet == "Hello world!"

# ... but the builtin version will fail
D.greet = cached_property(greet)
# D().greet  # this will fail
like image 732
thorwhalen Avatar asked Jun 02 '20 19:06

thorwhalen


2 Answers

TL;DR The cache is the instance dict itself, and the name of the property is needed as the key. The chosen design imposes the limitation, but (IMO) it's a good compromise. lazyprop is not thread-safe, or at least may call self.func more than is strictly necessary in a multi-threaded environment.


To start, it is documented that __set_name__ is only called if the assignment occurs in the context of type creation. The note in the documentation (adapted to your example) shows what you need to do: call __set_name__ yourself.

class D:
    def __init__(self, greeting='Hello'):
        self.greeting = greeting

def greet(self):
    return self.greeting + " world!"

D.greet = cached_property(greet)
D.greet.__set_name__(D, 'greet')  # sets D.greet.attrname = "greet" for later use

assert D().greet == "hello world!"

Why is the attribute name needed? cached_property.__set__ is not defined, so given d = D(), d.greet will first look for an instance attribute named greet in d.__dict__. If it is not found, then D.greet.__get__(d, D) will be called. That function basically does three things: it calls the original greet function if needed to compute the value, then saves it to a new instance attribute with the same name, and then returns the computed value.

"Wait", you ask, "what do you mean, 'if needed'? Didn't you just say D.greet.__get__ is only called if the instance attribute doesn't already exist?" Yes, but in a multithreaded environment, you don't know if another thread might also be executing D.greet.__get__ at the same time. In order to prevent a race condition, __get__ goes through the following steps (you can follow along in the code if you like):

  1. Check for an instance attribute with the same name, in case it was created in another thread after the current call started.
  2. If not, try to get a lock so we can create the instance attribute ourselves.
  3. Once you get the lock, look for the instance attribute again, in case someone created it while we waited for the lock.
  4. If the instance attribute still does not exist, we can safely create it ourselves.
  5. Finally, whether we pulled a value from the instance attribute or called the original greet function ourselves, we can return the value.

With all this in mind, I would call this a limitation rather than a bug, but a limitation that is easily worked around. This implementation is probably simpler than one that tries not to rely on the name of the property itself for maintaining the necessary mapping.

like image 107
chepner Avatar answered Sep 22 '22 16:09

chepner


This is neither a bug nor a limitation. Using __set_name__ is merely a different means to deduce the property name - one that is more robust when used with regular class syntax.

For example, __set_name__ also works for anonymous functions with only the cached_property bound directly to a name:

from functools import cached_property
import random

class Bar:
    foo = cached_property(lambda self: random.random())

bar = bar()
print(bar.foo)  # 0.9901613829744336
print(bar.foo)  # 0.9901613829744336

When using cached_property.cached_property instead, the value is stored improperly – namely as bar.<lambda> – preventing it from shadowing the property.

>>> functools_bar.__dict__
{'foo': 0.9901613829744336}
>>> cached_property_bar.__dict__
{'<lambda>': 0.7003011051281254}
like image 36
MisterMiyagi Avatar answered Sep 22 '22 16:09

MisterMiyagi