I am trying to find an optimal solution for the following problem: there is a need to design a database (postgres-based), the system of triggers and counters in it, which will form a system of efficiently querying, updating and storing information on 'how much unread comments exist in each article (or blog entry, or smth. similar), that is displayed on the page'.
Every solution that comes to the head, has some serious disadvantages, either in querying, or the storing, or the updating part. I.e. it needs too much storage, or too much updates, or too costy queries.
What about your expirience? Maybe there is an already formed nice solution for this kind of problems?
I would keep the schema as simple as possible, so querying will be as simple as possible. This usually also has the lowest storage requirements. Of course, set indices to support this query.
Next step: measure the performance! "To measure is to know." What is the response time? What is the load on the server? As long as the performance is acceptable, keep the schema and query simple. Do not sacrifice maintainability if it is not absolutely necessary: your successors will thank you for it later.
If performance really is a problem, look at the caching functionality of the framework you are using for your application. NOT performing a query is always faster than performing an optimized one.
If you really don't succeed within your resource envelope, maybe you have to tweak the user experience. Perhaps storing the date of last access to a thread is enough.
I don't believe that the typical, normalised approach would leave you with inefficient queries. Suppose you have a table article_comments
with PK (article_id, comment_id)
and another table comments_seen_by_user
with PK (user_id, article_id, comment_id)
. All you need to do is, for each article listed on the page:
SELECT count(*) FROM article_comments ac
WHERE article_id = ? -- Parameter
AND NOT EXISTS (
SELECT 1 FROM comments_seen_by_user csbu
WHERE csbu.user_id = ? -- Parameter
AND csbu.article_id = ac.article_id
AND csbu.comment_id = ac.comment_id
)
If you show 20 articles on a page, you'll run the above query 20 times, and each run will use an index to pull out say 10-20 rows from article_comments
, and the subquery test is just another index scan on comments_seen_by_user
, so all in all you might have 20 * (20 * 2) = 800 indexed lookups to perform to show a given page. That's no sweat to a modern DB. And I'm probably overlooking even better query plans that PostgreSQL might find.
Have you tried this, and found performance wanting? If so, my first guess would be that you haven't VACUUM
ed in a while. Otherwise, I must have got my estimates for numbers of articles per page, or comments per article, wrong -- please update with further details in that case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With