If I do this
post = Post.find_by_id(post_id, :include => :comments)
two queries are performed (one for post data and and another for the post's comments). Then when I do post.comments, another query is not performed because data is already cached.
Is there a way to do just one query and still access the comments via post.comments?
You can avoid most n+1 queries in rails by simply eager loading associations. Eager loading allows you to load all of your associations (parent and children) once instead of n+1 times (which often happens with lazy loading, rails' default). As seen above, . includes allows nested association eager loading!
Now that you understand the problem it can typically be avoided by doing a join fetch in your query. This basically forces the fetch of the lazy loaded object so the data is retrieved in one query instead of n+1 queries.
The n+1 query problem is one of the most common scalability bottlenecks. It involves fetching a list of resources from a database that includes other associated resources within them. This means that we might have to query for the associated resources separately.
Both are used for the same purpose. Includes: Uses eager loading, When we want to fetch data along with an associated table then includes must be used. Joins: Uses lazy loading. We can use joins when we want to consider the data as a condition from the joined table but not using any attributes from the table.
No, there is not. This is the intended behavior of :include
, since the JOIN
approach ultimately comes out to be inefficient.
For example, consider the following scenario: the Post
model has 3 fields that you need to select, 2 fields for Comment
, and this particular post has 100 comments. Rails could run a single JOIN
query along the lines of:
SELECT post.id, post.title, post.author_id, comment.id, comment.body
FROM posts
INNER JOIN comments ON comment.post_id = post.id
WHERE post.id = 1
This would return the following table of results:
post.id | post.title | post.author_id | comment.id | comment.body
---------+------------+----------------+------------+--------------
1 | Hello! | 1 | 1 | First!
1 | Hello! | 1 | 2 | Second!
1 | Hello! | 1 | 3 | Third!
1 | Hello! | 1 | 4 | Fourth!
...96 more...
You can see the problem already. The single-query JOIN
approach, though it returns the data you need, returns it redundantly. When the database server sends the result set to Rails, it will send the post's ID, title, and author ID 100 times each. Now, suppose that the Post
had 10 fields you were interested in, 8 of which were text blocks. Eww. That's a lot of data. Transferring data from the database to Rails does take work on both sides, both in CPU cycles and RAM, so minimizing that data transfer is important for making the app run faster and leaner.
The Rails devs crunched the numbers, and most applications run better when using multiple queries that only fetch each bit of data once rather than one query that has the potential to get hugely redundant.
Of course, there comes a time in every developer's life when a join is necessary in order to run complex conditions, and that can be achieved by replacing :include
with :joins
. For prefetching relationships, however, the approach Rails takes in :include
is much better for performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With