Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - caching a property to avoid future calculations

In the following example, cached_attr is used to get or set an attribute on a model instance when a database-expensive property (related_spam in the example) is called. In the example, I use cached_spam to save queries. I put print statements when setting and when getting values so that I could test it out. I tested it in a view by passing an Egg instance into the view and in the view using {{ egg.cached_spam }}, as well as other methods on the Egg model that make calls to cached_spam themselves. When I finished and tested it out the shell output in Django's development server showed that the attribute cache was missed several times, as well as successfully gotten several times. It seems to be inconsistent. With the same data, when I made small changes (as little as changing the print statement's string) and refreshed (with all the same data), different amounts of misses / successes happened. How and why is this happening? Is this code incorrect or highly problematic?

class Egg(models.Model):
    ... fields

    @property
    def related_spam(self):
        # Each time this property is called the database is queried (expected).
        return Spam.objects.filter(egg=self).all()  # Spam has foreign key to Egg.

    @property
    def cached_spam(self):
        # This should call self.related_spam the first time, and then return
        # cached results every time after that.
        return self.cached_attr('related_spam')

    def cached_attr(self, attr):
        """This method (normally attached via an abstract base class, but put
        directly on the model for this example) attempts to return a cached
        version of a requested attribute, and calls the actual attribute when
        the cached version isn't available."""
        try:
            value = getattr(self, '_p_cache_{0}'.format(attr))
            print('GETTING - {0}'.format(value))
        except AttributeError:
            value = getattr(self, attr)
            print('SETTING - {0}'.format(value))
            setattr(self, '_p_cache_{0}'.format(attr), value)
        return value
like image 373
orokusaki Avatar asked Nov 26 '10 22:11

orokusaki


2 Answers

Nothing wrong with your code, as far as it goes. The problem probably isn't there, but in how you use that code.

The main thing to realise is that model instances don't have identity. That means that if you instantiate an Egg object somewhere, and a different one somewhere else, even if they refer to the same underlying database row they won't share internal state. So calling cached_attr on one won't cause the cache to be populated in the other.

For example, assuming you have a RelatedObject class with a ForeignKey to Egg:

my_first_egg = Egg.objects.get(pk=1)
my_related_object = RelatedObject.objects.get(egg__pk=1)
my_second_egg = my_related_object.egg

Here my_first_egg and my_second_egg both refer to the database row with pk 1, but they are not the same object:

>>> my_first_egg.pk == my_second_egg.pk
True
>>> my_first_egg is my_second_egg
False

So, filling the cache on my_first_egg doesn't fill it on my_second_egg.

And, of course, objects won't persist across requests (unless they're specifically made global, which is horrible), so the cache won't persist either.

like image 192
Daniel Roseman Avatar answered Oct 05 '22 23:10

Daniel Roseman


Http servers that scale are shared-nothing; you can't rely on anything being singleton. To share state, you need to connect to a special-purpose service.

Django's caching support is appropriate for your use case. It isn't necessarily a global singleton either; if you use locmem://, it will be process-local, which could be the more efficient choice.

like image 32
Tobu Avatar answered Oct 06 '22 00:10

Tobu