Am building up a service that needs to maintain something along the lines of a case tracking system. Here's our model:
class Incident(models.Model):
title = models.CharField(max_length=128)
category = models.ForeignKey(Category)
status = models.ForeignKey(Status)
severity = models.ForeignKey(Severity)
owned_by = models.ForeignKey(User, related_name="owned_by", null=True, blank=True)
next_action = models.ForeignKey(IncidentAction)
created_date = models.DateTimeField()
created_by = models.ForeignKey(User, related_name="opened_by")
last_edit_date = models.DateTimeField(null=True, blank=True)
last_edit_by = models.ForeignKey(User, related_name="last_edit_by", null=True, blank=True)
closed_date = models.DateTimeField(null=True, blank=True)
closed_by = models.ForeignKey(User, related_name="Closed by", null=True, blank=True)
Because there are a lot of foreign keys being pulled into this model, it makes for interesting sql queries. We've been using as a trial the djblets data grid and the django debug toolbar, and are alarmed at the sheer number of queries being hit every time we add a new column for view that uses a foreign key, it does basically this type of query workflow:
#prepare the grid
select * from incident_table;
#render each row
for each row in incident table
for each column that is a foreign key select row from foreign table with id
It does an additional select query per row for each column that tries to pull a property for a foreign key.
I'm thinking that this is a universal problem with django and its ORM with regard to pulling in properties from foreign key models for display. As a test, I dropped the data grid and just did a simple list of properties for a queryset, and saw the queries balloon up in a similar way.
We're wanting to scale this up with tons of users hitting the model. As comparison, I made a similar view off the User model, and its full display is just done with one query because if you only pull fields from the given model, it doesn't do an additional db hit per additional column.
Some optimizations we tried were:
But there are some deeper questions we're soliciting the wisdom of the crowd on:
Related questions that raised concerns on caching and foreign keys:
DB / performance: layout of django model that rarely refers to its parent more than once, Django ORM: caching and manipulating ForeignKey objects:
Your intermediate model must contain one - and only one - foreign key to the source model (this would be Group in our example). If you have more than one foreign key, a validation error will be raised.
Django automatically creates an index for all models. ForeignKey columns. From Django documentation: A database index is automatically created on the ForeignKey .
ForeignKey is a Django ORM field-to-column mapping for creating and working with relationships between tables in relational databases.
select_related() is the correct solution; you're wrong about how it's supposed to work. I think you're not using select_related correctly if you're still getting multiple queries across the specified FK. A quick log of a Python session (Studio has a FK to django.auth.user here):
>>> from django.db import connection
>>> studios = Studio.objects.all().select_related('user')
>>> for studio in studios:
>>> print studio.user.email
>>>
[email protected]
[email protected]
>>> len(connection.queries)
1
So, I got a list of Studio objects (2 in my test DB) and got the User for each one in one SQL query. Without the select_related() call, it takes three queries.
Note that select_related doesn't handle many-to-many relationships -- though I think you can manually query the intermediate table of the m2m to follow those FKs either direction without requiring the extra query, as long as you're OK starting your queryset from the intermediate object. Maybe that's what's catching you? You only specified FK relationships, not m2ms, so I gave a simple illustration of that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With