Let's say I have a class Foo
that stores some statistical data, and I want to encapsulate the access to these data using Python properties. This is particularly useful, for example, when I only store the variance of a variable and want to have access to its standard deviation: in that case, I could define the property Foo.std
and make it return the square root of the variance.
The problem with this approach is that if I need to access Foo.std
multiple times, I will calculate the square root each time; furthermore, since the notation of a property is exactly like that of an attribute, the user of my class might not be aware that a computation is taking place everytime the property is accessed.
One alternative in this example would be to calculate the standard deviation every time I update my variance, and set it as an attribute. However, that would be inefficient if I don't need to access it at every update.
My question is: what is the best approach to use a Python property efficiently, when you need to perform a costly calculation? Should I cache the result after the first call and delete the cache in case of an update? Or should I rather not use a property and use a method Foo.get_std()
instead?
Usually you can do this through caching. For example you can write:
class Foo:
def __int__(self, also, other, arguments):
# ...
self._std = None
@property
def std(self):
if self._std is None:
# ... calculate standard deviation
self._std = ...
return self._std
def alter_state(self, some, arguments):
# update state
self._std = None
So here we have a propert std
but also an attribute _std
. In case the standard deviation is not yet calculated, or you alter the object state such that the standard deviation might have changed, we set _std
to None
. Now if we access .std
, we first check if _std
is None
. If that is the case we calculate the standard deviation and store it into _std
and return it. Such that - if the object is not changed - we can later simply retrieve it.
If we alter the object such that the standard deviation might have changed, we set _std
back to None
, to force re-evaluation in case we access .std
again.
If we alter the state of a Foo
object twice before recalcuating the standard deviation, we will only recalculate it once. So you can frequently change the Foo
object, with (close to) no extra cost involved (except setting self._std
to None
). So if you have a huge dataset and you update it extensively you only will put effort in calculating the standard deviation again when you actually need it.
Furthermore this can also be an oportunity to update statistical measures in case that is (very) cheap. Say for instance you have a list of objects that you frequently update in bulk. In case you increment all elements with a constant, then the mean will also increment with that constant. So functions that alter a state such that some metrics can easily be altered as well, might update the metrics, instead of making these None
.
Note that whether .std
is a property, or a function is irrelevant. The user does not have to know how often this has to be calculated. The std()
function will guarantee that once calcuated, a second retrieval will be quite fast.
Adding to Willem's answer: starting Python 3.8, we now have functools.cached_property
. The official documentation even uses std and variance as examples. I'm linking the 3.9 documentation (https://docs.python.org/3.9/library/functools.html#functools.cached_property) since it has additional explanation on how it works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With