Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django: Why is Foo.objects.extra(...) So Much Faster Than Foo.objects.raw?

So I am trying to optimize a fairly odd query, but this is a legacy database so I make do with what I have. These are the queries I am trying. They provide the same output at this point. w is my queryset.

def future_schedule(request):

    past = datetime.date.today()-datetime.timedelta(days=730)

    extra_select = {
        'addlcomplete': 'SELECT Complete FROM tblAdditionalDates WHERE Checkin.ShortSampleID = tblAdditionalDates.ShortSampleID',
        'addldate': 'SELECT AddlDate FROM tblAdditionalDates WHERE Checkin.ShortSampleID = tblAdditionalDates.ShortSampleID'
    }
    extra_where = ['''(Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > %s AND Checkin.DateCompleted IS NULL AND Checkin.Canceled = 0) OR (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > %s AND Checkin.DateCompleted IS NOT NULL AND Checkin.DateFinalCompleted IS NULL AND Checkin.DateFinalExpected IS NOT NULL AND Checkin.Canceled = 0) '''
    ]
    extra_params = [past, past]

    w = Checkin.objects.extra(select=extra_select, where=extra_where, params=extra_params)

# OR This one

    w = Checkin.objects.raw('''SELECT Checkin.SampleID, Checkin.ShortSampleID, Checkin.Company, A.Complete, Checkin.HasDates, A.AddlDate FROM Checkin LEFT JOIN (SELECT ShortSampleID, Complete, AddlDate FROM tblAdditionalDates) A ON A.ShortSampleID = Checkin.ShortSampleID WHERE (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > "2009-01-01" AND Checkin.DateCompleted IS NULL AND Checkin.Canceled = 0) OR (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > "2009-01-01" AND Checkin.DateCompleted IS NOT NULL AND Checkin.DateFinalCompleted IS NULL AND Checkin.DateFinalExpected IS NOT NULL AND Checkin.Canceled = 0)''')

Both of these return the same number of records (322). .extra is about 10 seconds faster in rendering the HTML than the .raw query and for all intensive purposes, the .raw query is mildly less complex even. Does anyone have any insight as to why this might be? Based on my structure, .raw may be the only way I get the data I need (I need the addlcomplete and addldate in the extra_select dict and use them in a Having clause to further filter the queryset) but I certainly don't like how long it is taking. Is it on the template layer that it is slower or the actual query layer? How can I best debug this?

Thank for your help in this quest for optimization amidst poor data structures.

UPDATE 1: 2011-10-03

So I installed django-debugtoolbar to snoop around a bit and I eneabled MySQL general logging and came up with the following:

using .filter() or .extra() Total Query count is 2. Using .raw() Total Query count is 1984!!! (Spooky literary reference not ignored)

My template is using a regroup and then looping through that regroup. No relations are being followed, no template tags other than builtins are being used. Select_related is NOT being used and I still only get the 2 queries. Looking at the mysql log, sure enough - 1984 queries.

When looking at the queries that were executed, basically it looks like for every {{ Modelinstance.field }} django was doing a SELECT pk, field FROM Model WHERE Model.pk = Modelinstance.pk This seems completely wrong if you ask me. Am I missing something here or is django really running wild with queries?

END UPDATE 1

UPDATE 2 See answer below

Greg

like image 921
geraldcor Avatar asked Oct 03 '11 15:10

geraldcor


People also ask

Why Django QuerySets are lazy?

This is because a Django QuerySet is a lazy object. It contains all of the information it needs to populate itself from the database, but will not actually do so until the information is needed.

What is Django ORM and its benefits over raw SQL?

The Django ORM provides many tools to express queries without writing raw SQL. For example: The QuerySet API is extensive. You can annotate and aggregate using many built-in database functions. Beyond those, you can create custom query expressions.

What does F do in Django?

F() expressions. An F() object represents the value of a model field, transformed value of a model field, or annotated column. It makes it possible to refer to model field values and perform database operations using them without actually having to pull them out of the database into Python memory.


1 Answers

Ok. Here are my final conclusions. While Furbeenator is correct about the internal Django optimizations, turns out there is a much larger, user error that caused the slowdown and the aforementioned thousands of queries.

It is clearly documented in the Raw queryset docs that when you defer fields (i.e. not using SELECT * FROM ...) and are selecting only certain fields specifically (SELECT Checkin.Sampleid, ... the fields that you don't select can still be accessed but with another database call. So, if you are selecting a subset of fields in your raw query and you forgot a field in your query that you use in your template, Django performs a database lookup to find that field you are referencing in your template rather than complaining about it not existing or whatever. So, let's say you leave out 5 fields from your query (which is what I did) that you end up referencing in your template and you have 300 records that you are looping through. This incurs 1500 extra database hits to get those 5 fields for each record.

So, beware of hidden references and thank god for Django Debug Toolbar

like image 196
geraldcor Avatar answered Sep 23 '22 15:09

geraldcor