Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL query - how to fetch non-read messages efficiently

Tags:

sql

How do I best gather messages that have not been read by a given user?

Existing tables

Message table
----------------------------------
id    title    body    sentAt

User table
----------------------------------
id    username

Read Messages table
----------------------------------
user_id    message_id

I'm thinking something like

select 
  m.id, m.title, m.sentAt, u.username
from 
  message m,
  [user] u
where 
  u.id = 1 and -- @userId parameter
  m.id not in 
    (select r.message_id from read_messages r where r.user_id = u.id)

Unfortunately for me I dont understand much looking at the execution plan. /Adam

like image 657
Adam Asham Avatar asked Mar 22 '09 00:03

Adam Asham


People also ask

How do you make a query faster in a database?

The way to make a query run faster is to reduce the number of calculations that the software (and therefore hardware) must perform. To do this, you'll need some understanding of how SQL actually makes calculations.


3 Answers

Suggesting an alternative approach:

I was faced with the exact same problem at work earlier. I wasted a good week in trying to figure the best way to do this. I ended up with creating a join table, as you have done, but the table contains only unread messages, instead of keeping track of read messages.

Because

  1. The status quo is "everyone has read all their messages".
  2. Getting the unread messages (or their count) should be as fast as possible.
  3. The status quo should be the least straining status on the system.

Now, if I would've kept track of all the messages everyone has read, the clutter in the database grows pretty rapidly (users*messages rows), easily leading to thousands of rows of 'dead weight' in even smaller applications. This problem is exaggerated if the lifetime of messages are indefinite - you could be keeping track of message statuses that are many years old.

If keeping track of the inverse, your "unread messages" table contains only a handful of rows, and they diminish for each message that a user reads. Also, getting the amount of unread messages is as simple as "SELECT COUNT(*) FROM unread WHERE user = foo".

But

As everything, this is a trade-off. While reading is pretty much as fast as computationally possible, writing is a chore. For each written message, you need to insert an entry to this join table. Additionally, if multiple people can read the same message, you need to insert one row for each recipient. If the recipients are implicit (e.g. only a user group's name is given, or even with the criteria such as "anyone who has access to this thing"), creating new messages becomes even more complicated.

But I feel this is a fair compromise.

YMMV, HTH.

like image 179
Henrik Paul Avatar answered Nov 05 '22 10:11

Henrik Paul


NOT IN is very expensive. Instead, you could do something like:

SELECT
  m.id, m.title, m.sentAt
FROM
  message m
  LEFT JOIN [Read Messages] rm
    ON m.message_id = rm.message_id AND rm.user_id = @userID
WHERE
  rm.user_id IS NULL

Provided you have the right indexes, this should be much faster.

You're fetching all messages and LEFT JOINing read messages for that user. Then, in the WHERE clause you're asking for that user_id to be NULL for that message, meaning the user has not read it yet.

like image 22
Seb Avatar answered Nov 05 '22 08:11

Seb


HP's alternative suggestion is most likely suiting you well. However, if it is not, Id suggest using NOT EXISTS rather than LEFT JOIN if it's possible in your environment.

At least on MS SQL it would give a slightly cheaper query plan, since it would not need the last filtering (user_id IS NULL)

SELECT  ...
FROM    message m
WHERE   NOT EXISTS (
    SELECT 1
    FROM    read_messages rm
    WHERE   rm.usr_id = ...
    AND rm.msg_id = m.msg_id
)
like image 38
Brimstedt Avatar answered Nov 05 '22 10:11

Brimstedt