In Django doc, <blockquote> <code>select_related()</code> "follows" foreign-key relationships, selecting additional related-object data when it executes its query. <code>prefetch_related()</code> does a separate lookup for each relationship, and does the "joining" in Python. </blockquote> What does it mean by "doing the joining in python"? Can someone illustrate with an example? My understanding is that for foreign key relationship, use <code>select_related</code>; and for M2M relationship, use <code>prefetch_related</code>. Is this correct?

Your understanding is mostly correct. You use <code>select_related</code> when the object that you're going to be selecting is a single object, so <code>OneToOneField</code> or a <code>ForeignKey</code>. You use <code>prefetch_related</code> when you're going to get a "set" of things, so <code>ManyToManyField</code>s as you stated or reverse <code>ForeignKey</code>s. Just to clarify what I mean by "reverse <code>ForeignKey</code>s" here's an example: <pre class="prettyprint"><code>class ModelA(models.Model): pass class ModelB(models.Model): a = ForeignKey(ModelA) ModelB.objects.select_related('a').all() # Forward ForeignKey relationship ModelA.objects.prefetch_related('modelb_set').all() # Reverse ForeignKey relationship </code></pre> The difference is that <code>select_related</code> does an SQL join and therefore gets the results back as part of the table from the SQL server. <code>prefetch_related</code> on the other hand executes another query and therefore reduces the redundant columns in the original object (<code>ModelA</code> in the above example). You may use <code>prefetch_related</code> for anything that you can use <code>select_related</code> for. The tradeoffs are that <code>prefetch_related</code> has to create and send a list of IDs to select back to the server, this can take a while. I'm not sure if there's a nice way of doing this in a transaction, but my understanding is that Django always just sends a list and says SELECT ... WHERE pk IN (...,...,...) basically. In this case if the prefetched data is sparse (let's say U.S. State objects linked to people's addresses) this can be very good, however if it's closer to one-to-one, this can waste a lot of communications. If in doubt, try both and see which performs better. Everything discussed above is basically about the communications with the database. On the Python side however <code>prefetch_related</code> has the extra benefit that a single object is used to represent each object in the database. With <code>select_related</code> duplicate objects will be created in Python for each "parent" object. Since objects in Python have a decent bit of memory overhead this can also be a consideration.

What's the difference between select_related and prefetch_related in Django ORM?

Tags:

python

django

django-models

django-orm

In Django doc,

select_related() "follows" foreign-key relationships, selecting additional related-object data when it executes its query.

prefetch_related() does a separate lookup for each relationship, and does the "joining" in Python.

What does it mean by "doing the joining in python"? Can someone illustrate with an example?

My understanding is that for foreign key relationship, use select_related; and for M2M relationship, use prefetch_related. Is this correct?

362

asked Jul 06 '15 02:07

NeoWang

2 Answers

Your understanding is mostly correct. You use select_related when the object that you're going to be selecting is a single object, so OneToOneField or a ForeignKey. You use prefetch_related when you're going to get a "set" of things, so ManyToManyFields as you stated or reverse ForeignKeys. Just to clarify what I mean by "reverse ForeignKeys" here's an example:

class ModelA(models.Model):     pass  class ModelB(models.Model):     a = ForeignKey(ModelA)  ModelB.objects.select_related('a').all() # Forward ForeignKey relationship ModelA.objects.prefetch_related('modelb_set').all() # Reverse ForeignKey relationship

The difference is that select_related does an SQL join and therefore gets the results back as part of the table from the SQL server. prefetch_related on the other hand executes another query and therefore reduces the redundant columns in the original object (ModelA in the above example). You may use prefetch_related for anything that you can use select_related for.

The tradeoffs are that prefetch_related has to create and send a list of IDs to select back to the server, this can take a while. I'm not sure if there's a nice way of doing this in a transaction, but my understanding is that Django always just sends a list and says SELECT ... WHERE pk IN (...,...,...) basically. In this case if the prefetched data is sparse (let's say U.S. State objects linked to people's addresses) this can be very good, however if it's closer to one-to-one, this can waste a lot of communications. If in doubt, try both and see which performs better.

Everything discussed above is basically about the communications with the database. On the Python side however prefetch_related has the extra benefit that a single object is used to represent each object in the database. With select_related duplicate objects will be created in Python for each "parent" object. Since objects in Python have a decent bit of memory overhead this can also be a consideration.

121

answered Sep 29 '22 05:09

CrazyCasta

Both methods achieve the same purpose, to forego unnecessary db queries. But they use different approaches for efficiency.

The only reason to use either of these methods is when a single large query is preferable to many small queries. Django uses the large query to create models in memory preemptively rather than performing on demand queries against the database.

select_related performs a join with each lookup, but extends the select to include the columns of all joined tables. However this approach has a caveat.

Joins have the potential to multiply the number of rows in a query. When you perform a join over a foreign key or one-to-one field, the number of rows won't increase. However, many-to-many joins do not have this guarantee. So, Django restricts select_related to relations that won't unexpectedly result in a massive join.

The "join in python" for prefetch_related is a little more alarming then it should be. It creates a separate query for each table to be joined. It filters each of these table with a WHERE IN clause, like:

SELECT "credential"."id",        "credential"."uuid",        "credential"."identity_id" FROM   "credential" WHERE  "credential"."identity_id" IN     (84706, 48746, 871441, 84713, 76492, 84621, 51472);

Rather than performing a single join with potentially too many rows, each table is split into a separate query.

answered Sep 29 '22 05:09

cdosborn

Related questions
                            
                                Python setup.py develop vs install
                            
                                how to check if a file is a directory or regular file in python? [duplicate]
                            
                                django order_by query set, ascending and descending
                            
                                How to add an empty column to a dataframe?
                            
                                Running Bash commands in Python
                            
                                Threading pool similar to the multiprocessing Pool?
                            
                                Get the key corresponding to the minimum value within a dictionary
                            
                                Splitting on first occurrence
                            
                                How to avoid "RuntimeError: dictionary changed size during iteration" error?
                            
                                Matplotlib make tick labels font size smaller
                            
                                How do I specify new lines on Python, when writing on files?
                            
                                pandas: filter rows of DataFrame with operator chaining
                            
                                How to pretty-print a numpy.array without scientific notation and with given precision?
                            
                                Matplotlib plots: removing axis, legends and white spaces
                            
                                Download file from web in Python 3
                            
                                How to convert JSON data into a Python object?
                            
                                How to tell if tensorflow is using gpu acceleration from inside python shell?
                            
                                How can I find all matches to a regular expression in Python?
                            
                                How do I run all Python unit tests in a directory?
                            
                                time.sleep -- sleeps thread or process?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With