Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Django returning stale cache data?

I have two Django models as shown below, MyModel1 & MyModel2:

class MyModel1(CachingMixin, MPTTModel):
    name = models.CharField(null=False, blank=False, max_length=255)
    objects = CachingManager()

    def __str__(self):
        return "; ".join(["ID: %s" % self.pk, "name: %s" % self.name, ] )

class MyModel2(CachingMixin, models.Model):
    name = models.CharField(null=False, blank=False, max_length=255)
    model1 = models.ManyToManyField(MyModel1, related_name="MyModel2_MyModel1")
    objects = CachingManager()

    def __str__(self):
        return "; ".join(["ID: %s" % self.pk, "name: %s" % self.name, ] )

MyModel2 has a ManyToMany field to MyModel1 entitled model1

Now look what happens when I add a new entry to this ManyToMany field. According to Django, it has no effect:

>>> m1 = MyModel1.objects.all()[0]
>>> m2 = MyModel2.objects.all()[0]
>>> m2.model1.all()
[]
>>> m2.model1.add(m1)
>>> m2.model1.all()
[]

Why? It seems definitely like a caching issue because I see that there is a new entry in Database table myapp_mymodel2_mymodel1 for this link between m2 & m1. How should I fix it??

like image 970
Saqib Ali Avatar asked Jun 16 '16 03:06

Saqib Ali


People also ask

What is cache stale?

Use CommonSpot stale cache to temporarily serve recently expired or "stale" cache to reduce or eliminate delays in delivering content to site visitors. Stale cache can be particularly beneficial to sites with frequently changing content or that make heavy use of Page Indexes or Custom Elements in re-use mode.

Does Django automatically cache?

Local Memory Cache Unless we explicitly specify another caching method in our settings file, Django defaults to local memory caching. As its name implies, this method stores cached data in RAM on the machine where Django is running. Local memory caching is fast, responsive, and thread-safe.

How do you prevent stale data in cache?

Stale data is an artifact of caching, in which an object in the cache is not the most recent version committed to the data source. To avoid stale data, implement an appropriate cache locking strategy. By default, EclipseLink optimizes concurrency to minimize cache locking during read or write operations.


2 Answers

Is django-cache-machine really needed?

MyModel1.objects.all()[0]

Roughly translates to

SELECT * FROM app_mymodel LIMIT 1

Queries like this are always fast. There would not be a significant difference in speeds whether you fetch it from the cache or from the database.

When you use cache manager you actually add a bit of overhead here that might make things a bit slower. Most of the time this effort will be wasted because there may not be a cache hit as explained in the next section.

How django-cache-machine works

Whenever you run a query, CachingQuerySet will try to find that query in the cache. Queries are keyed by {prefix}:{sql}. If it’s there, we return the cached result set and everyone is happy. If the query isn’t in the cache, the normal codepath to run a database query is executed. As the objects in the result set are iterated over, they are added to a list that will get cached once iteration is done.

source: https://cache-machine.readthedocs.io/en/latest/

Accordingly, with the two queries executed in your question being identical, cache manager will fetch the second result set from memcache provided the cache hasn't been invalided.

The same link explains how cache keys are invalidated.

To support easy cache invalidation, we use “flush lists” to mark the cached queries an object belongs to. That way, all queries where an object was found will be invalidated when that object changes. Flush lists map an object key to a list of query keys.

When an object is saved or deleted, all query keys in its flush list will be deleted. In addition, the flush lists of its foreign key relations will be cleared. To avoid stale foreign key relations, any cached objects will be flushed when the object their foreign key points to is invalidated.

It's clear that saving or deleting an object would result in many objects in the cache having to be invalidated. So you are slowing down these operations by using cache manager. Also worth noting is that the invalidation documentation does not mention many to many fields at all. There is an open issue for this and from your comment on that issue it's clear that you have discovered it too.

Solution

Chuck cache machine. Caching all queries are almost never worth it. It leads to all kind of hard to find bugs and issues. The best approach is to optimize your tables and fine tune your queries. If you find a particular query that is too slow cache it manually.

like image 175
e4c5 Avatar answered Sep 28 '22 12:09

e4c5


This was my workaround solution:

    >>> m1 = MyModel1.objects.all()[0]
    >>> m1
    <MyModel1: ID: 8887972990743179; name: my-name-blahblah>

    >>> m2 = MyModel2.objects.all()[0]
    >>> m2.model1.all()
    []
    >>> m2.model1.add(m1)
    >>> m2.model1.all()
    []

    >>> MyModel1.objects.invalidate(m1)
    >>> MyModel2.objects.invalidate(m2)
    >>> m2.save()
    >>> m2.model1.all()
    [<MyModel1: ID: 8887972990743179; name: my-name-blahblah>]
like image 26
Saqib Ali Avatar answered Sep 28 '22 11:09

Saqib Ali