Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A Question About Nested Collections in Django and Query Efficiency

I see a lot of answers like this one:

Printing a list of persons with more than one home each home with more than one

I have tried that answer with similar models and it seems like a terribly inefficient way of doing this. Each iteration seems to make a separate query sometimes resulting in thousands of queries to a database. I realize that you can cache the query sets, but it still seems very wrong. So the question is, do you use that method? If not, how do you do it?

like image 635
Matt Avatar asked Apr 13 '09 00:04

Matt


2 Answers

This is a very good question, and one not limited to Django's ORM framework.

I always feel it's important to remember some of the problems that an object-relational mapping (ORM) framework solves:

  • Object-oriented CRUD: If the rest of the application is based on strong object-oriented principles, accessing data persistence using objects makes the code just that much more coherent, internally consistent, and sometimes shorter.

  • Persistence layer encapsulation: An ORM provides a clear layer in your application for DB access. It encapsulates all the functions needed to read/write data in one spot, the epitome of the so-called DRY (do not repeat yourself) principle. This makes a few things much easier: model changes, because all the DB-facing select and insert/update code is in one spot rather than throughout the app, security, because all DB access goes through one location, and testing, because it's easy to mock out your data models and access if they are clearly delineated.

  • SQL security: While it's easy to secure raw SQL use against injection attacks and such, it's even easier if you have an ORM framework as a single point of DB-contact that does it for you so you never have to think about it.

Notice that speed is not on the list. An ORM is a level of indirection between your code and the database. We certainly hold ORM designers responsible for writing a framework that produces good SQL statements, but an ORM is meant to provide code- and architecture-level efficiency, not executional efficiency. A developer who has read a basic book on SQL will always be able to get better performance talking directly to the DB.

There are certainly strategies to counter this, and in Django those are select_related() as ozan has mentioned, and site/view/miscellaneous caching, but they won't give you the same performance as a direct SQL statement. Because of this, I would never use an ORM framework that does not provide some mechanism for issuing a raw SQL statement on those occasions when I need speed. For example, I often resort to raw SQL when generating a large report out of the database that joins many tables; the ORM way can take minutes, the SQL way can take seconds.

Having said that, I never start by worrying about each individual query. My advice for anyone coming to an ORM layer is: don't nanny the ORM's database access. Write your application or module, and then profile it, tweaking those areas that truly need the performance boost, or using caching/select_related to reduce the overall DB-chattiness of your application.

like image 188
Jarret Hardie Avatar answered Nov 17 '22 15:11

Jarret Hardie


You can use the select_related() queryset method to reduce the number of database queries. You can also specify the depth, so in the given example if the telephone number model had additional foreign relationships you would used select_related(depth=2) to avoid selecting further "levels" of related entities.

like image 45
ozan Avatar answered Nov 17 '22 15:11

ozan