Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

django performance on querying models with many foreign keys?

Am building up a service that needs to maintain something along the lines of a case tracking system. Here's our model:

class Incident(models.Model):    
    title = models.CharField(max_length=128)
    category = models.ForeignKey(Category)
    status = models.ForeignKey(Status)    
    severity = models.ForeignKey(Severity)
    owned_by = models.ForeignKey(User, related_name="owned_by", null=True, blank=True)   
    next_action = models.ForeignKey(IncidentAction)    
    created_date  = models.DateTimeField()
    created_by = models.ForeignKey(User, related_name="opened_by")    
    last_edit_date = models.DateTimeField(null=True, blank=True)
    last_edit_by = models.ForeignKey(User, related_name="last_edit_by", null=True, blank=True)        
    closed_date  = models.DateTimeField(null=True, blank=True)
    closed_by = models.ForeignKey(User, related_name="Closed by", null=True, blank=True)

Because there are a lot of foreign keys being pulled into this model, it makes for interesting sql queries. We've been using as a trial the djblets data grid and the django debug toolbar, and are alarmed at the sheer number of queries being hit every time we add a new column for view that uses a foreign key, it does basically this type of query workflow:

#prepare the grid
select * from incident_table;
#render each row
for each row in incident table
    for each column that is a foreign key select row from foreign table with id

It does an additional select query per row for each column that tries to pull a property for a foreign key.

I'm thinking that this is a universal problem with django and its ORM with regard to pulling in properties from foreign key models for display. As a test, I dropped the data grid and just did a simple list of properties for a queryset, and saw the queries balloon up in a similar way.

We're wanting to scale this up with tons of users hitting the model. As comparison, I made a similar view off the User model, and its full display is just done with one query because if you only pull fields from the given model, it doesn't do an additional db hit per additional column.

Some optimizations we tried were:

  • django-orm-cache: Doesn't seem to work with django 1.0.4
  • django-caching: This works well for caching the oft queried models
  • view level caching with memcached
  • Edit: using select_related() may speed up template rendering by not having a round trip back to the database, but it seems like it follows the foreign keys just ahead of time on the original queryset using the single query per foreign key. Just seems to move the multi database query hit ahead of time.

But there are some deeper questions we're soliciting the wisdom of the crowd on:

  • For models with tons of foreign keys, what is the best way to make it efficiently query to get properties from foreign keys?
    • Is caching the dependent models the only way to go using the above ORM caching systems?
    • Or is this a standard case of outgrowing the ORM and needing to roll our own custom sql query with joins to get the desired datagrid output in as efficiently as possible?

Related questions that raised concerns on caching and foreign keys:

DB / performance: layout of django model that rarely refers to its parent more than once, Django ORM: caching and manipulating ForeignKey objects:

like image 235
dmyung Avatar asked Dec 28 '09 17:12

dmyung


People also ask

Can a Django model have 2 foreign keys?

Your intermediate model must contain one - and only one - foreign key to the source model (this would be Group in our example). If you have more than one foreign key, a validation error will be raised.

Does Django automatically index foreign keys?

Django automatically creates an index for all models. ForeignKey columns. From Django documentation: A database index is automatically created on the ForeignKey .

What is ForeignKey Django?

ForeignKey is a Django ORM field-to-column mapping for creating and working with relationships between tables in relational databases.


1 Answers

select_related() is the correct solution; you're wrong about how it's supposed to work. I think you're not using select_related correctly if you're still getting multiple queries across the specified FK. A quick log of a Python session (Studio has a FK to django.auth.user here):

>>> from django.db import connection
>>> studios = Studio.objects.all().select_related('user')
>>> for studio in studios:
>>>     print studio.user.email
>>>        
[email protected]
[email protected]
>>> len(connection.queries) 
1

So, I got a list of Studio objects (2 in my test DB) and got the User for each one in one SQL query. Without the select_related() call, it takes three queries.

Note that select_related doesn't handle many-to-many relationships -- though I think you can manually query the intermediate table of the m2m to follow those FKs either direction without requiring the extra query, as long as you're OK starting your queryset from the intermediate object. Maybe that's what's catching you? You only specified FK relationships, not m2ms, so I gave a simple illustration of that.

like image 154
Yoni Samlan Avatar answered Sep 21 '22 09:09

Yoni Samlan