Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Python properties efficiently?

Let's say I have a class Foo that stores some statistical data, and I want to encapsulate the access to these data using Python properties. This is particularly useful, for example, when I only store the variance of a variable and want to have access to its standard deviation: in that case, I could define the property Foo.std and make it return the square root of the variance.

The problem with this approach is that if I need to access Foo.std multiple times, I will calculate the square root each time; furthermore, since the notation of a property is exactly like that of an attribute, the user of my class might not be aware that a computation is taking place everytime the property is accessed.

One alternative in this example would be to calculate the standard deviation every time I update my variance, and set it as an attribute. However, that would be inefficient if I don't need to access it at every update.

My question is: what is the best approach to use a Python property efficiently, when you need to perform a costly calculation? Should I cache the result after the first call and delete the cache in case of an update? Or should I rather not use a property and use a method Foo.get_std() instead?

like image 330
guissoares Avatar asked Nov 10 '17 23:11

guissoares


2 Answers

Usually you can do this through caching. For example you can write:

class Foo:

    def __int__(self, also, other, arguments):
        # ...
        self._std = None

    @property
    def std(self):
        if self._std is None:
            # ... calculate standard deviation
            self._std = ...
        return self._std

    def alter_state(self, some, arguments):
        # update state
        self._std = None

So here we have a propert std but also an attribute _std. In case the standard deviation is not yet calculated, or you alter the object state such that the standard deviation might have changed, we set _std to None. Now if we access .std, we first check if _std is None. If that is the case we calculate the standard deviation and store it into _std and return it. Such that - if the object is not changed - we can later simply retrieve it.

If we alter the object such that the standard deviation might have changed, we set _std back to None, to force re-evaluation in case we access .std again.

If we alter the state of a Foo object twice before recalcuating the standard deviation, we will only recalculate it once. So you can frequently change the Foo object, with (close to) no extra cost involved (except setting self._std to None). So if you have a huge dataset and you update it extensively you only will put effort in calculating the standard deviation again when you actually need it.

Furthermore this can also be an oportunity to update statistical measures in case that is (very) cheap. Say for instance you have a list of objects that you frequently update in bulk. In case you increment all elements with a constant, then the mean will also increment with that constant. So functions that alter a state such that some metrics can easily be altered as well, might update the metrics, instead of making these None.

Note that whether .std is a property, or a function is irrelevant. The user does not have to know how often this has to be calculated. The std() function will guarantee that once calcuated, a second retrieval will be quite fast.

like image 123
Willem Van Onsem Avatar answered Oct 20 '22 08:10

Willem Van Onsem


Adding to Willem's answer: starting Python 3.8, we now have functools.cached_property. The official documentation even uses std and variance as examples. I'm linking the 3.9 documentation (https://docs.python.org/3.9/library/functools.html#functools.cached_property) since it has additional explanation on how it works.

like image 2
Jodel Asur Avatar answered Oct 20 '22 10:10

Jodel Asur