I have the following setup with ActiveRecord and MySQL:
groups
through membershipsusers
through membershipsThere is also an index by group_id and user_id described in schema.rb:
add_index "memberships", ["group_id", "user_id"], name: "uugj_index", using: :btree
User.where(id: Membership.uniq.pluck(:user_id))
(3.8ms) SELECT DISTINCT
memberships
.user_id
FROMmemberships
User Load (11.0ms) SELECTusers
.* FROMusers
WHEREusers
.id
IN (1, 2...)
User.where(id: Membership.uniq.select(:user_id))
User Load (15.2ms) SELECT
users
.* FROMusers
WHEREusers
.id
IN (SELECT DISTINCTmemberships
.user_id
FROMmemberships
)
User.uniq.joins(:memberships)
User Load (135.1ms) SELECT DISTINCT
users
.* FROMusers
INNER JOINmemberships
ONmemberships
.user_id
=users
.id
What is the best approach for doing this? Why the query with join is much slower?
What is the difference between includes and joins? The most important concept to understand when using includes and joins is they both have their optimal use cases. Includes uses eager loading whereas joins uses lazy loading, both of which are powerful but can easily be abused to reduce or overkill performance.
You can use Active Record joins to query one model based on data from a related table. For example you can return specific categories based on each category's products. Technical note: joins in Rails runs an SQL 'inner join' operation and returns the records from the model you're operating on.
The first query is bad because it sucks all of the user ids into a Ruby array and then sends them back to the database. If you have a lot of users, that's a huge array and a huge amount of bandwidth, plus 2 roundtrips to the database instead of one. Furthermore, the database has no way to efficiently handle that huge array.
The second and third approaches are both efficient database-driven solutions (one is a subquery, and one is a join), but you need to have the proper index. You need an index on the memberships
table on user_id
.
add_index :memberships, :user_id
The index that you already have, would only be helpful if you wanted to find all of the users that belong to a particular group.
Update:
If you have a lot of columns and data in your users
table, the DISTINCT users.*
in the 3rd query is going to be fairly slow because MySQL has to compare a lot of data in order to ensure uniqueness.
To be clear: this is not intrinsic slowness with JOIN
, it's slowness with DISTINCT
. For example: Here is a way to avoid the DISTINCT
and still use a JOIN
:
SELECT users.* FROM users
INNER JOIN (SELECT DISTINCT memberships.user_id FROM memberships) AS user_ids
ON user_ids.user_id = users.id;
Given all of that, in this case, I believe the 2nd query is going to be the best approach for you. The 2nd query should be even faster than reported in your original results if you add the above index. Please retry the second approach, if you haven't done so yet since adding the index.
Although the 1st query has some slowness issues of its own, from your comment, it's clear that it is still faster than the 3rd query (at least, for your particular dataset). The trade-offs of these approaches is going to depend on your particular dataset in regards to how many users you have and how many memberships you have. Generally speaking, I believe the 1st approach is still the worst even if it ends up being faster.
Also, please note that the index I'm recommending is particularly designed for the three queries you listed in your question. If you have other kinds of queries against these tables, you may be better served by additional indexes, or possibly multi-column indexes, as @tata mentioned in his/her answer.
The query with join is slow because it loads all columns from database despite of the fact that rails don't preload them this way. If you need preloading then you should use includes
(or similar) instead. But includes will be even slower because it will construct objects for all associations. Also you should know that
User.where.not(id: Membership.uniq.select(:user_id))
will return empty set in case when there is at least one membership with user_id
equal to nil
while the query with pluck
will return the correct relation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With