Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing a slow django queryset from a large and growing dataset

I have a page that loads way too slow. Somehow I need to improve the way it's data is queried (caching? partial load/pages? etc.)

Note I am a django noob and haven't quite wrapped my head around model.Manager and and models.query.QuerySet so if this setup looks awkward....

Currently, the page is taking about 18 seconds to load the queryset, and it currently only has about 500 records. There will be on average about 100 new records a day.

Network stats

The database is Postgresql

The slow view:

def approvals(request):
    ...
    approved_submissions = QuestSubmission.objects.all_approved()
    ...

The slow queryset:

class QuestSubmissionQuerySet(models.query.QuerySet):
    ...

    def approved(self):
        return self.filter(is_approved=True)

    def completed(self):
         return self.filter(is_completed=True).order_by('-time_completed')

    ...

class QuestSubmissionManager(models.Manager):
    def get_queryset(self):
        return QuestSubmissionQuerySet(self.model, using=self._db)

    def all_approved(self, user=None):
        return self.get_queryset().approved().completed()

    ...

Resulting SQL from QuestSubmission.objects.all_approved():

'SELECT "quest_manager_questsubmission"."id", "quest_manager_questsubmission"."quest_id", "quest_manager_questsubmission"."user_id", "quest_manager_questsubmission"."ordinal", "quest_manager_questsubmission"."is_completed", "quest_manager_questsubmission"."time_completed", "quest_manager_questsubmission"."is_approved", "quest_manager_questsubmission"."time_approved", "quest_manager_questsubmission"."timestamp", "quest_manager_questsubmission"."updated", "quest_manager_questsubmission"."game_lab_transfer" FROM "quest_manager_questsubmission" WHERE ("quest_manager_questsubmission"."is_approved" = True AND "quest_manager_questsubmission"."is_completed" = True) ORDER BY "quest_manager_questsubmission"."time_completed" DESC'

The slow model:

class QuestSubmission(models.Model):
    quest = models.ForeignKey(Quest)
    user = models.ForeignKey(settings.AUTH_USER_MODEL, related_name="quest_submission_user")
    ordinal = models.PositiveIntegerField(default = 1, help_text = 'indicating submissions beyond the first for repeatable quests')
    is_completed = models.BooleanField(default=False)
    time_completed = models.DateTimeField(null=True, blank=True)
    is_approved = models.BooleanField(default=False)
    time_approved = models.DateTimeField(null=True, blank=True)
    timestamp = models.DateTimeField(auto_now=True, auto_now_add=False)
    updated = models.DateTimeField(auto_now=False, auto_now_add=True)
    game_lab_transfer = models.BooleanField(default = False, help_text = 'XP not counted')

    class Meta:
        ordering = ["time_approved", "time_completed"]

    objects = QuestSubmissionManager()

    #other methods
    ....

What strategies are available to fix this? I tried using django's Paginator, but it only seems to display in pages, but it still loads the entire queryset.

like image 848
43Tesseracts Avatar asked Sep 19 '15 16:09

43Tesseracts


People also ask

How can I make Django ORM faster?

Using select_related() and prefetch_related() functions Django ORM provides two common methods to avoid the N+1 issue, which are select_related and prefetch_related. The select_related method performs a simple JOIN operation of tables and can be used for foreign key or one-to-one model fields.

Why Django ORM is lazy?

Django's ORM allows you to interact with a relational database in a way that feels natural for the Object-Oriented Python programming language. The code is precise code and semantically clear. Django uses lazy loading to only load the author if you need it. This is great, but can lead to an explosion of queries.

Is Django QuerySet lazy?

This is because a Django QuerySet is a lazy object. It contains all of the information it needs to populate itself from the database, but will not actually do so until the information is needed.


1 Answers

The first thing to look at:

  • Is this query slow because it is returning a very large result set?

or

  • Is this query slow because it's taking a while to filter down the table?

Assuming the former, you don't have a lot of good options besides "return less data".

If it's the latter, you should probably run an EXPLAIN on the database, but right off the bat I'd say you probably want an index, possibly on (is_approved, is_completed). Which can be done with:

class Meta:
    index_together = [
        ["is_completed", "is_approved"],
    ]
like image 97
Alex Gaynor Avatar answered Sep 26 '22 07:09

Alex Gaynor