Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indexing columns across multiple tables in PostgreSQL

I'm trying to optimize the following join query:

A notification is a record that says whether a user has read some activity. One notification points to one activity but many users can be notified about an activity. The activity record has some columns such as the workspace the activity is in and the type of activity.

This query gets the user non-comment notifications that have been read in a specific workspace ordered by time.

explain analyze
select activity.id from activity, notification
where notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.read = true

and notification.activity_id = activity.id

and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
and activity.type != 'commented'
order by activity.end_time desc
limit 20;

The problem is that this query has to run through every notification the user has every gotten.

Limit  (cost=4912.35..4912.36 rows=1 width=24) (actual time=138.767..138.779 rows=20 loops=1)
  ->  Sort  (cost=4912.35..4912.36 rows=1 width=24) (actual time=138.766..138.770 rows=20 loops=1)
        Sort Key: activity.end_time DESC
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Nested Loop  (cost=32.57..4912.34 rows=1 width=24) (actual time=1.354..138.606 rows=447 loops=1)
              ->  Bitmap Heap Scan on notification  (cost=32.01..3847.48 rows=124 width=16) (actual time=1.341..6.639 rows=1218 loops=1)
                    Recheck Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
                    Filter: read
                    Rows Removed by Filter: 4101
                    Heap Blocks: exact=4774
                    ->  Bitmap Index Scan on notification_user_id_idx  (cost=0.00..31.98 rows=988 width=0) (actual time=0.719..0.719 rows=5355 loops=1)
                          Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
              ->  Index Scan using activity_pkey on activity  (cost=0.56..8.59 rows=1 width=24) (actual time=0.108..0.108 rows=0 loops=1218)
                    Index Cond: (id = notification.activity_id)
                    Filter: ((type <> 'commented'::activity_type) AND (space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid))
                    Rows Removed by Filter: 1
Planning time: 0.428 ms
Execution time: 138.825 ms

Edit: Here is the performance after the cache has been warmed.

Limit  (cost=4912.35..4912.36 rows=1 width=24) (actual time=13.618..13.629 rows=20 loops=1)
  ->  Sort  (cost=4912.35..4912.36 rows=1 width=24) (actual time=13.617..13.621 rows=20 loops=1)
        Sort Key: activity.end_time DESC
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Nested Loop  (cost=32.57..4912.34 rows=1 width=24) (actual time=1.365..13.447 rows=447 loops=1)
              ->  Bitmap Heap Scan on notification  (cost=32.01..3847.48 rows=124 width=16) (actual time=1.352..6.606 rows=1218 loops=1)
                    Recheck Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
                    Filter: read
                    Rows Removed by Filter: 4101
                    Heap Blocks: exact=4774
                    ->  Bitmap Index Scan on notification_user_id_idx  (cost=0.00..31.98 rows=988 width=0) (actual time=0.729..0.729 rows=5355 loops=1)
                          Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
              ->  Index Scan using activity_pkey on activity  (cost=0.56..8.59 rows=1 width=24) (actual time=0.005..0.005 rows=0 loops=1218)
                    Index Cond: (id = notification.activity_id)
                    Filter: ((type <> 'commented'::activity_type) AND (space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid))
                    Rows Removed by Filter: 1
Planning time: 0.438 ms
Execution time: 13.673 ms

I could create a multi-column index on user_id and read, but that doesn't solve the issue I'm trying to solve.

I could solve this problem myself by manually denormalizing the data, adding the space_id, type, and end_time columns in the notification record, but that seems like it should be unnecessary.

I would expect Postgres to be able create an index across the two tables, but everything I read so far says this isn't possible.

So my question: what is the best way to optimize this query?


Edit: After creating the suggested indexes:

create index tmp_index_1 on activity using btree (
    space_id, 
    id, 
    end_time
) where (
    type != 'commented'
);

create index tmp_index_2 on notification using btree (
    user_id,
    activity_id
) where (
    read = true
);

The query performance improved 3x.

explain analyse
select activity.id from activity
INNER JOIN notification  ON  notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
    and notification.read = true
    and notification.activity_id = activity.id
    and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
    and activity.type != 'commented'
order by activity.end_time desc
limit 20;

Limit  (cost=955.26..955.27 rows=1 width=24) (actual time=4.386..4.397 rows=20 loops=1)
  ->  Sort  (cost=955.26..955.27 rows=1 width=24) (actual time=4.385..4.389 rows=20 loops=1)
        Sort Key: activity.end_time DESC
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Nested Loop  (cost=1.12..955.25 rows=1 width=24) (actual time=0.035..4.244 rows=447 loops=1)
              ->  Index Only Scan using tmp_index_2 on notification  (cost=0.56..326.71 rows=124 width=16) (actual time=0.017..1.039 rows=1218 loops=1)
                    Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
                    Heap Fetches: 689
              ->  Index Only Scan using tmp_index_1 on activity  (cost=0.56..5.07 rows=1 width=24) (actual time=0.002..0.002 rows=0 loops=1218)
                    Index Cond: ((space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid) AND (id = notification.activity_id))
                    Heap Fetches: 1
Planning time: 0.484 ms
Execution time: 4.428 ms

The one thing that still bothers me about this query is the rows=1218 and loops=1218. This query is looping through all of the read user notifications and querying against the activities table.

I would expect to be able to create a single index to read all of this in a manner that would mimic denormalizing this data. For example, if I add space_id, type, and end_time to the notification table, I could create the following index and read in fractions of a millisecond.

create index tmp_index_3 on notification using btree (
    user_id,
    space_id,
    end_time desc
) where (
    read = true 
    and type != 'commented'
);

Is this not currently possible within Postgres without denormalizing?

like image 315
Chet Avatar asked Oct 23 '25 16:10

Chet


2 Answers

Add the index:

create index ix1_activity on activity (space_id, type, end_time, id);

create index ix2_notification on notification (activity_id, user_id, read);

These two "covering indexes" could make your query real fast.

Additionally, with a little bit of luck, it will read the activity table first (only 20 rows), and perform a Nested Loop Join (NLJ) on notification. That is, a very limited index walk.

like image 193
The Impaler Avatar answered Oct 26 '25 06:10

The Impaler


looking to your code you should use for filter a composite index on

table notification  columns  : user_id, read, activity_id


table activity columns space_id, type, id 

and for query and order by you could also add end_time in composite for activity

   table activity columns space_id, type, id, end_time

and you should also use explict inner join sintax

select activity.id from activity
INNER JOIN notification  ON  notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
    and notification.read = true
    and notification.activity_id = activity.id
    and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
    and activity.type != 'commented'
order by activity.end_time desc
limit 20;
like image 37
ScaisEdge Avatar answered Oct 26 '25 07:10

ScaisEdge



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!