I'm trying to optimize the following join query:
A notification is a record that says whether a user has read some activity. One notification points to one activity but many users can be notified about an activity. The activity record has some columns such as the workspace the activity is in and the type of activity.
This query gets the user non-comment notifications that have been read in a specific workspace ordered by time.
explain analyze
select activity.id from activity, notification
where notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.read = true
and notification.activity_id = activity.id
and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
and activity.type != 'commented'
order by activity.end_time desc
limit 20;
The problem is that this query has to run through every notification the user has every gotten.
Limit (cost=4912.35..4912.36 rows=1 width=24) (actual time=138.767..138.779 rows=20 loops=1)
-> Sort (cost=4912.35..4912.36 rows=1 width=24) (actual time=138.766..138.770 rows=20 loops=1)
Sort Key: activity.end_time DESC
Sort Method: top-N heapsort Memory: 27kB
-> Nested Loop (cost=32.57..4912.34 rows=1 width=24) (actual time=1.354..138.606 rows=447 loops=1)
-> Bitmap Heap Scan on notification (cost=32.01..3847.48 rows=124 width=16) (actual time=1.341..6.639 rows=1218 loops=1)
Recheck Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
Filter: read
Rows Removed by Filter: 4101
Heap Blocks: exact=4774
-> Bitmap Index Scan on notification_user_id_idx (cost=0.00..31.98 rows=988 width=0) (actual time=0.719..0.719 rows=5355 loops=1)
Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
-> Index Scan using activity_pkey on activity (cost=0.56..8.59 rows=1 width=24) (actual time=0.108..0.108 rows=0 loops=1218)
Index Cond: (id = notification.activity_id)
Filter: ((type <> 'commented'::activity_type) AND (space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid))
Rows Removed by Filter: 1
Planning time: 0.428 ms
Execution time: 138.825 ms
Edit: Here is the performance after the cache has been warmed.
Limit (cost=4912.35..4912.36 rows=1 width=24) (actual time=13.618..13.629 rows=20 loops=1)
-> Sort (cost=4912.35..4912.36 rows=1 width=24) (actual time=13.617..13.621 rows=20 loops=1)
Sort Key: activity.end_time DESC
Sort Method: top-N heapsort Memory: 27kB
-> Nested Loop (cost=32.57..4912.34 rows=1 width=24) (actual time=1.365..13.447 rows=447 loops=1)
-> Bitmap Heap Scan on notification (cost=32.01..3847.48 rows=124 width=16) (actual time=1.352..6.606 rows=1218 loops=1)
Recheck Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
Filter: read
Rows Removed by Filter: 4101
Heap Blocks: exact=4774
-> Bitmap Index Scan on notification_user_id_idx (cost=0.00..31.98 rows=988 width=0) (actual time=0.729..0.729 rows=5355 loops=1)
Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
-> Index Scan using activity_pkey on activity (cost=0.56..8.59 rows=1 width=24) (actual time=0.005..0.005 rows=0 loops=1218)
Index Cond: (id = notification.activity_id)
Filter: ((type <> 'commented'::activity_type) AND (space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid))
Rows Removed by Filter: 1
Planning time: 0.438 ms
Execution time: 13.673 ms
I could create a multi-column index on user_id and read, but that doesn't solve the issue I'm trying to solve.
I could solve this problem myself by manually denormalizing the data, adding the space_id, type, and end_time columns in the notification record, but that seems like it should be unnecessary.
I would expect Postgres to be able create an index across the two tables, but everything I read so far says this isn't possible.
So my question: what is the best way to optimize this query?
Edit: After creating the suggested indexes:
create index tmp_index_1 on activity using btree (
space_id,
id,
end_time
) where (
type != 'commented'
);
create index tmp_index_2 on notification using btree (
user_id,
activity_id
) where (
read = true
);
The query performance improved 3x.
explain analyse
select activity.id from activity
INNER JOIN notification ON notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.read = true
and notification.activity_id = activity.id
and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
and activity.type != 'commented'
order by activity.end_time desc
limit 20;
Limit (cost=955.26..955.27 rows=1 width=24) (actual time=4.386..4.397 rows=20 loops=1)
-> Sort (cost=955.26..955.27 rows=1 width=24) (actual time=4.385..4.389 rows=20 loops=1)
Sort Key: activity.end_time DESC
Sort Method: top-N heapsort Memory: 27kB
-> Nested Loop (cost=1.12..955.25 rows=1 width=24) (actual time=0.035..4.244 rows=447 loops=1)
-> Index Only Scan using tmp_index_2 on notification (cost=0.56..326.71 rows=124 width=16) (actual time=0.017..1.039 rows=1218 loops=1)
Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
Heap Fetches: 689
-> Index Only Scan using tmp_index_1 on activity (cost=0.56..5.07 rows=1 width=24) (actual time=0.002..0.002 rows=0 loops=1218)
Index Cond: ((space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid) AND (id = notification.activity_id))
Heap Fetches: 1
Planning time: 0.484 ms
Execution time: 4.428 ms
The one thing that still bothers me about this query is the rows=1218 and loops=1218. This query is looping through all of the read user notifications and querying against the activities table.
I would expect to be able to create a single index to read all of this in a manner that would mimic denormalizing this data. For example, if I add space_id, type, and end_time to the notification table, I could create the following index and read in fractions of a millisecond.
create index tmp_index_3 on notification using btree (
user_id,
space_id,
end_time desc
) where (
read = true
and type != 'commented'
);
Is this not currently possible within Postgres without denormalizing?
Add the index:
create index ix1_activity on activity (space_id, type, end_time, id);
create index ix2_notification on notification (activity_id, user_id, read);
These two "covering indexes" could make your query real fast.
Additionally, with a little bit of luck, it will read the activity table first (only 20 rows), and perform a Nested Loop Join (NLJ) on notification. That is, a very limited index walk.
looking to your code you should use for filter a composite index on
table notification columns : user_id, read, activity_id
table activity columns space_id, type, id
and for query and order by you could also add end_time in composite for activity
table activity columns space_id, type, id, end_time
and you should also use explict inner join sintax
select activity.id from activity
INNER JOIN notification ON notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.read = true
and notification.activity_id = activity.id
and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
and activity.type != 'commented'
order by activity.end_time desc
limit 20;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With