I have an ActiveRecord query that uses the OR operator to chain together 2 queries. The results come back fine, but the speed of executing the combined query is ~10 times as slow as executing either of the 2 queries in on their own.
We have an Event
model and an Invitation
model. A User
can be invited to an Event
by being targeted through an invitation filter, or by being individually invited by having an Invitation
record.
So when determining how many users are invited to a particular event, we have to look at all those with Invitations
and all those matching the filter. We do that here:
@invited_count = @invited_by_individual.or(@invited_by_filter).distinct.count(:id)
It's important to note, both @invited_by_individual
and @invited_by_filter
relations have references
and includes
statements within them.
Now, the problem is that when we execute that query, it takes about 1200ms. If we were to do the queries individually, each of them only take about 80ms. So @invited_by_filter.distinct.count
and @invited_by_individual.distinct.count
both return results in about 80ms, but neither of these is complete on its own.
Is there any way to speed up the query with the OR operator? Why is this happening in the first place?
Here is the SQL generated by the ActiveRecord queries:
Fast, single query:
(79.7ms)
SELECT COUNT(DISTINCT "users"."id")
FROM "users"
LEFT OUTER JOIN "invitations"
ON "invitations"."user_id" = "users"."id"
WHERE "invitations"."event_id" = $1 [["event_id", 732]]
Slow, with combined query:
(1220.7ms)
SELECT COUNT(DISTINCT "users"."id")
FROM "users"
LEFT OUTER JOIN "invitations"
ON "invitations"."user_id" = "users"."id"
WHERE ("invitations"."event_id" = $1 OR "users"."organization_id" = $2) [["event_id", 732], ["organization_id", 13]]
Update, here's the EXPLAIN:
(1418.2ms) SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE ("users"."root_organization_id" = $1 OR "invitations"."event_id" = $2) [["root_organization_id", -1], ["event_id", 749]]
=>
EXPLAIN for: SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE ("users"."root_organization_id" = $1 OR "invitations"."event_id" = $2) [["root_organization_id", -1], ["event_id", 749]]
#=> QUERY PLAN
Aggregate (cost=121781.56..121781.57 rows=1 width=8)
-> Hash Right Join (cost=113248.88..121778.64 rows=1165 width=8)
Hash Cond: (invitations.user_id = users.id)
Filter: ((users.root_organization_id = '-1'::integer) OR (invitations.event_id = 749))
-> Seq Scan on invitations (cost=0.00..1299.70 rows=63470 width=8)
-> Hash (cost=93513.28..93513.28 rows=1135328 width=12)
-> Seq Scan on users (cost=0.00..93513.28 rows=1135328 width=12)
(7 rows)
Update 2, EXPLAIN for queries ran individually, does use the indices:
(91.5ms) SELECT COUNT(*) FROM "users" INNER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "users"."root_organization_id" = $1 [["root_organization_id", -1]]
=>
EXPLAIN for: SELECT COUNT(*) FROM "users" INNER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "users"."root_organization_id" = $1 [["root_organization_id", -1]]
#=> QUERY PLAN
Aggregate (cost=19.05..19.06 rows=1 width=8)
-> Nested Loop (cost=0.72..19.05 rows=1 width=0)
-> Index Scan using index_users_on_root_organization_id on users (cost=0.43..4.45 rows=1 width=8)
Index Cond: (root_organization_id = '-1'::integer)
-> Index Only Scan using index_invitations_on_user_id on invitations (cost=0.29..14.57 rows=3 width=4)
Index Cond: (user_id = users.id)
(6 rows)
and
EXPLAIN for: SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "invitations"."event_id" = $1 [["event_id", 749]]
#=> QUERY PLAN
Aggregate (cost=536.34..536.35 rows=1 width=8)
-> Nested Loop (cost=0.72..536.19 rows=62 width=8)
-> Index Scan using index_invitations_on_event_id on invitations (cost=0.29..11.98 rows=62 width=4)
Index Cond: (event_id = 749)
-> Index Only Scan using users_pkey on users (cost=0.43..8.45 rows=1 width=8)
Index Cond: (id = invitations.user_id)
(6 rows)
UNION
enables you to leverage both indexes while still preventing duplicates.
User.from(
"(#{@invited_by_individual.to_sql}
UNION
#{@invited_by_filter.to_sql})"
).count
This is your query that uses OR
:
SELECT COUNT(DISTINCT "users"."id")
FROM "users"
LEFT OUTER JOIN "invitations"
ON "invitations"."user_id" = "users"."id"
WHERE ("invitations"."event_id" = $1 OR "users"."organization_id" = $2)
If you try the following query in Postgres I expect it to produce the same result, but work faster:
SELECT
COUNT(DISTINCT id) AS cc
FROM
(
SELECT
"invitations"."user_id" AS id
FROM
"invitations"
WHERE
("invitations"."event_id" = $1)
UNION ALL
SELECT
"users"."id"
FROM
"users"
WHERE
("users"."organization_id" = $2)
) AS T
;
If you have indexes on "invitations"."event_id"
and on "users"."organization_id"
, engine should use them. If you don't have such indexes, create them.
The query with OR
is slow, because the optimizer is not smart enough to perform this translation and split original query into two parts. When you run each part separately the engine sees that it can use an appropriate index. When the query joins two tables and have an OR
condition in the WHERE
filter no single index can return the rows needed, so the engine doesn't attempt to use any index. It reads all 1135328 rows from the users
table and it reads all 63470 rows from the invitations
table. Naturally, it is slow.
I have no idea how to translate this query to the ActiveRecord syntax.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With