Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql inbox query

Tags:

sql

postgresql

I have a Messages table that looks like this:

                    Messages
+-----+------------+-------------+--------------+
|  id |  sender_id | receiver_id |  created_at  |
+-----------------------------------------------+
|  1  |      1     |      2      |   1/1/2013   |
|  2  |      1     |      2      |   1/1/2013   |
|  3  |      2     |      1      |   1/2/2013   |
|  4  |      3     |      2      |   1/2/2013   |
|  5  |      3     |      2      |   1/3/2013   |
|  6  |      5     |      4      |   1/4/2013   |
+-----------------------------------------------+

Where a 'thread' is a group of messages between a given sender_id and receiver_id I want a query to return the most recent 10 messages for the most recent 10 threads where either the sender_id or receiver_id is a given id.

Expected output where given user_id is 5:

+-----+------------+-------------+--------------+
|  id |  sender_id | receiver_id |  created_at  |
+-----------------------------------------------+
|  1  |      5     |      2      |   1/4/2013   |
|  2  |      5     |      2      |   1/4/2013   |
|  3  |      2     |      5      |   1/4/2013   |
|  4  |      3     |      5      |   1/4/2013   |
|  5  |      5     |      2      |   1/3/2013   |
|  6  |      5     |      4      |   1/3/2013   |
+-----------------------------------------------+

up to a limit of 10 messages between, for example, user 5 and 2 (above there are 4) and a limit of 10 threads (above there are 3).

I've been trying with this sort of query using a subquery but haven't managed to get the second limit on the number of distinct threads.

SELECT * FROM (SELECT DISTINCT ON (sender_id, receiver_id) messages.* 
FROM messages 
WHERE (receiver_id = 5 OR sender_id = 5) ORDER BY sender_id, receiver_id, 
created_at DESC)   
q ORDER BY created_at DESC 
LIMIT 10 OFFSET 0;

I'm considering creating a new Thread table containing a thread_id field which would be the concatenation of sender_id + receiver_id and then just joining on Messages but I have a sneaky suspicion that it should be doable with just one table.

like image 337
johnnymire Avatar asked Feb 05 '13 18:02

johnnymire


3 Answers

The tidiest query I could imagine to resolve your problem within one query is the following one:

select * from (
  select row_number() 
    over (partition by sender_id, receiver_id order by created_at desc) as rn, m.*
  from Messages m
  where (m.sender_id, m.receiver_id) in (
    select sender_id, receiver_id
    from Messages
    where sender_id = <id> or receiver_id = <id>
    group by sender_id, receiver_id
    order by max(created_at) desc
    limit 10 offset 0
  )
) res where res.rn <= 10

The row_number() over (partition by sender_id, receiver_id order by created_at desc) column will contain the row number of each message within each thread (it will be like the record number if you run a separate query to query just for one thread). Apart from this row number you query the message itself if it is contained in the 10 topmost threads (that is made by that (m.sender_id, m.receiver_id) in ...query.... And finally as you want just 10 topmost messages you limit the row number to be lower or equal to 10.

like image 141
Karel Horak Avatar answered Nov 15 '22 16:11

Karel Horak


I would suggest taking couling's answer and slightly modifying it so that it does provide effectively two queries using a common table expression:

WITH threads (sender_id, receiver_id, latest) as (
        select sender, 
               receiver,
               max(sent) 
          from sof_messages
         where receiver = <user>
            or sender = <user>
         group by sender,
               receiver
         order by 3
         limit 10
 ), 
 messages ([messages fields listed here], rank) as (
         select m.*, 
                rank() over (partition by (sender, receiver), order by sent desc)
           from sof_messages
          WHERE (sender, receiver) in (select (sender, receiver) from threads))
 SELECT * from messages where rank <= 10;

This has the advantage of allowing the planner to have a pretty good idea of when to use indexes here. In essence each of the three pieces to the query is independently planned.

like image 2
Chris Travers Avatar answered Nov 15 '22 15:11

Chris Travers


I'm posting this to show what can be done.

I do not really recommend using it.

It would be much better to do two separate queries: 1 to retrieve the 10 most recent threads and 1 repeated to pull back the 10 most recent messages for each thread.

However you may achieve your goal with the rank() window function as shown below.

select * from (
      select message.*,
             rank() over (partition by message.sender, message.receiver 
                              order by sent desc )  
      from sof_messages message,
           (
            select sender, 
                   receiver,
                   max(sent) 
              from sof_messages
             where receiver = <user>
                or sender = <user>
             group by sender,
                   receiver
             order by 3
             limit 10
           ) thread
      where message.sender = thread.sender
        and message.receiver = thread.receiver
      ) message_list

where rank <= 10

There are a couple of different queries which will achieve your goal with window functions, none of them particularly clean.

like image 1
Philip Couling Avatar answered Nov 15 '22 14:11

Philip Couling