I'm trying to optimize my ORM queries in django. I use connection.queries to view the queries that django generate for me. Assuming I have these models: <pre class="prettyprint"><code>class Book(models.Model): name = models.CharField(max_length=50) author = models.ForeignKey(Author) class Author(models.Model): name = models.CharField(max_length=50) </code></pre> Let's say, when I generate a specific webpage, I want to display all books, with their author name next to each of them. Also, I display seperately all the authors. So should I use <pre class="prettyprint"><code>Book.objects.all().select_related("author") </code></pre> Which will result in a JOIN query. Even if I do a line before: <pre class="prettyprint"><code>Author.objects.all() </code></pre> Obviously in template I will write something like <code>{{book.author.name}}</code>. So the question is, when I access a foreign key value (author), if django already has that object from another query, will that still result in additional query (for each book)? If no, so in that case, does using select_related actually creates performance overhead?

Django doesn't know about other queries! <code>Author.objects.all()</code> and <code>Book.objects.all()</code> are totally different querysets. So if have both in your view and pass them to template context but in your template you do something like: <pre class="prettyprint"> {% for book in books %} {{ book.author.name }} {% endfor %} </pre> and have N books this will result to N extra database queries (beyond the queries to get all books and authors) ! If instead you had done <code>Book.objects.all().select_related("author")</code> no extra queries will be done in the above template snippet. Now, <code>select_related()</code> of course adds some overhead to the queries. What happens is that when you do a <code>Book.objects.all()</code> django will return the result of <code>SELECT * FROM BOOKS</code>. If instead you do a <code>Book.objects.all().select_related("author")</code> django will return the result of <code>SELECT * FROM BOOKS B LEFT JOIN AUTHORS A ON B.AUTHOR_ID = A.ID</code>. So for each book it will return both the columns of the book and its corresponding author. However, this overhead is really much smaller when compared to the overhead of hitting the database N times (as explained before). So, even though <code>select_related</code> creates a small performance overhead (each query returns more fields from the database) it will actually be beneficial to use it except when you are totally sure that you'll need only the columns of the specific model you are querying. Finally, a great way to really see how many (and which exactly) queries are actuall exectuted in your database is to use django-debug-tooblar (https://github.com/django-debug-toolbar/django-debug-toolbar).

<pre class="prettyprint"><code>Book.objects.select_related("author") </code></pre> is good enough. No need for <code>Author.objects.all()</code> <pre class="prettyprint"><code>{{ book.author.name }} </code></pre> won't hit the database, because <code>book.author</code> has been prepopulated already.

django select_related - when to use it

Tags:

select

orm

django

django-models

django-queryset

django-orm

I'm trying to optimize my ORM queries in django. I use connection.queries to view the queries that django generate for me.

Assuming I have these models:

class Book(models.Model):
    name   = models.CharField(max_length=50)
    author = models.ForeignKey(Author)

class Author(models.Model):
    name   = models.CharField(max_length=50)

Let's say, when I generate a specific webpage, I want to display all books, with their author name next to each of them. Also, I display seperately all the authors.

So should I use

Book.objects.all().select_related("author")

Which will result in a JOIN query. Even if I do a line before:

Author.objects.all()

Obviously in template I will write something like {{book.author.name}}.
So the question is, when I access a foreign key value (author), if django already has that object from another query, will that still result in additional query (for each book)? If no, so in that case, does using select_related actually creates performance overhead?

803

asked Oct 20 '15 07:10

user3599803

3 Answers

You are actually asking two different questions:

1. does using select_related actually creates performance overhead?

You should see documentation about Django Query Cache:

Understand QuerySet evaluation

To avoid performance problems, it is important to understand:

that QuerySets are lazy.

when they are evaluated.

how the data is held in memory.

So in summary, Django caches in memory results evaluated within the same QuerySet object, that is, if you do something like that:

books = Book.objects.all().select_related("author")
for book in books:
    print(book.author.name)  # Evaluates the query set, caches in memory results
first_book = books[1]  # Does not hit db
print(first_book.author.name)  # Does not hit db

Will only hit db once as you prefetched Authors in select_related, all this stuff will result in a single database query with INNER JOIN.

BUT this won't do any cache between querysets, nor even with the same query:

books = Book.objects.all().select_related("author")
books2 = Book.objects.all().select_related("author")
first_book = books[1]  # Does hit db
first_book = books2[1]  # Does hit db

This is actually pointed out in docs:

We will assume you have done the obvious things above. The rest of this document focuses on how to use Django in such a way that you are not doing unnecessary work. This document also does not address other optimization techniques that apply to all expensive operations, such as general purpose caching.

2. if django already has that object from another query, will that still result in additional query (for each book)?

You are actually meaning if Django does ORM queries caching, which is a very different matter. ORM Queries caching, that is, if you do a query before and then you do the same query later, if database hasn't changed, the result is coming from a cache and not from an expensive database lookup.

The answer is not Django, not officially supported, but yes unofficially, yes through 3rd-party apps. The most relevant third-party apps that enables this type of caching are:

Johnny-Cache (older, not supporting django>1.6)
Django-Cachalot (newer, supports 1.6, 1.7, and still in dev 1.8)
Django-Cacheops (newer, supports Python 2.7 or 3.3+, Django 1.8+ and Redis 2.6+ (4.0+ recommended))

Take a look a those if you look for query caching and remember, first profile, find bottlenecks, and if they are causing a problem then optimize.

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming. Donald Knuth.

answered Sep 29 '22 16:09

danius

Django doesn't know about other queries! Author.objects.all() and Book.objects.all() are totally different querysets. So if have both in your view and pass them to template context but in your template you do something like:

{% for book in books %}
  {{ book.author.name }}
{% endfor %}

and have N books this will result to N extra database queries (beyond the queries to get all books and authors) !

If instead you had done Book.objects.all().select_related("author") no extra queries will be done in the above template snippet.

Now, select_related() of course adds some overhead to the queries. What happens is that when you do a Book.objects.all() django will return the result of SELECT * FROM BOOKS. If instead you do a Book.objects.all().select_related("author") django will return the result of SELECT * FROM BOOKS B LEFT JOIN AUTHORS A ON B.AUTHOR_ID = A.ID. So for each book it will return both the columns of the book and its corresponding author. However, this overhead is really much smaller when compared to the overhead of hitting the database N times (as explained before).

So, even though select_related creates a small performance overhead (each query returns more fields from the database) it will actually be beneficial to use it except when you are totally sure that you'll need only the columns of the specific model you are querying.

Finally, a great way to really see how many (and which exactly) queries are actuall exectuted in your database is to use django-debug-tooblar (https://github.com/django-debug-toolbar/django-debug-toolbar).

answered Sep 29 '22 15:09

Serafeim

Book.objects.select_related("author")

is good enough. No need for Author.objects.all()

{{ book.author.name }}

won't hit the database, because book.author has been prepopulated already.

answered Sep 29 '22 17:09

doniyor

Related questions
                            
                                RemovedInDjango18Warning: Creating a ModelForm without either the 'fields' attribute or the 'exclude' attribute is deprecated
                            
                                Django admin - make all fields readonly
                            
                                Django site with 2 languages
                            
                                AttributeError: module Django.contrib.auth.views has no attribute
                            
                                How to fix error "ERROR: Command errored out with exit status 1: python." when trying to install django-heroku using pip [duplicate]
                            
                                Annotate a sum of two fields multiplied
                            
                                Circular dependency in Django Rest Framework serializers
                            
                                Is django prefetch_related supposed to work with GenericRelation
                            
                                Using Django Rest Framework, how can I upload a file AND send a JSON payload?
                            
                                Change Django Templates Based on User-Agent
                            
                                How to filter (or replace) unicode characters that would take more than 3 bytes in UTF-8?
                            
                                Aggregation of an annotation in GROUP BY in Django
                            
                                Adding attributes into Django Model's Meta class
                            
                                How to apply multiple filters on a Django template variable?
                            
                                Django Models (1054, "Unknown column in 'field list'")
                            
                                How to use less css with django?
                            
                                Difference between setattr and object manipulation in python/django
                            
                                Adding extra data to Django Rest Framework results for entire result set
                            
                                AUTH_USER_MODEL refers to model .. that has not been installed and created AbstractUser models not able to login
                            
                                Django: Generic detail view must be called with either an object pk or a slug

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With