Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing an efficient system of "unread comments" counters

I am trying to find an optimal solution for the following problem: there is a need to design a database (postgres-based), the system of triggers and counters in it, which will form a system of efficiently querying, updating and storing information on 'how much unread comments exist in each article (or blog entry, or smth. similar), that is displayed on the page'.

Every solution that comes to the head, has some serious disadvantages, either in querying, or the storing, or the updating part. I.e. it needs too much storage, or too much updates, or too costy queries.

What about your expirience? Maybe there is an already formed nice solution for this kind of problems?

like image 340
Igor Katson Avatar asked Jan 17 '09 09:01

Igor Katson


3 Answers

I would keep the schema as simple as possible, so querying will be as simple as possible. This usually also has the lowest storage requirements. Of course, set indices to support this query.

Next step: measure the performance! "To measure is to know." What is the response time? What is the load on the server? As long as the performance is acceptable, keep the schema and query simple. Do not sacrifice maintainability if it is not absolutely necessary: your successors will thank you for it later.

If performance really is a problem, look at the caching functionality of the framework you are using for your application. NOT performing a query is always faster than performing an optimized one.

like image 97
wvanbergen Avatar answered Sep 25 '22 01:09

wvanbergen


If you really don't succeed within your resource envelope, maybe you have to tweak the user experience. Perhaps storing the date of last access to a thread is enough.

like image 26
David Schmitt Avatar answered Sep 24 '22 01:09

David Schmitt


I don't believe that the typical, normalised approach would leave you with inefficient queries. Suppose you have a table article_comments with PK (article_id, comment_id) and another table comments_seen_by_user with PK (user_id, article_id, comment_id). All you need to do is, for each article listed on the page:

SELECT count(*) FROM article_comments ac
WHERE article_id = ?                -- Parameter
AND NOT EXISTS (
    SELECT 1 FROM comments_seen_by_user csbu
    WHERE csbu.user_id = ?          -- Parameter
    AND   csbu.article_id = ac.article_id
    AND   csbu.comment_id = ac.comment_id
)

If you show 20 articles on a page, you'll run the above query 20 times, and each run will use an index to pull out say 10-20 rows from article_comments, and the subquery test is just another index scan on comments_seen_by_user, so all in all you might have 20 * (20 * 2) = 800 indexed lookups to perform to show a given page. That's no sweat to a modern DB. And I'm probably overlooking even better query plans that PostgreSQL might find.

Have you tried this, and found performance wanting? If so, my first guess would be that you haven't VACUUMed in a while. Otherwise, I must have got my estimates for numbers of articles per page, or comments per article, wrong -- please update with further details in that case.

like image 31
j_random_hacker Avatar answered Sep 24 '22 01:09

j_random_hacker