In my application, I have a table of responses to a topic. The structure is roughly as follows:
CREATE TABLE responses (
id INT NOT NULL PRIMARY KEY,
topic_id INT NOT NULL,
author_id INT NOT NULL,
response TEXT
);
id
is an auto-increment field, topic_id
and author_id
are foreign keys, there are the appropriate indices etc.
I always want to order by insertion time, usually by most recent. In most cases, I will be filtering by topic_id
. A typical query looks like this:
SELECT * FROM responses WHERE topic_id=123 ORDER BY id DESC LIMIT 20;
-- or, for pagination:
SELECT * FROM responses WHERE topic_id=123 AND id < 456789 ORDER BY id DESC LIMIT 20;
I want to implement a blocklist - each user has a list of author_id
s they don't want to see. I need to retrieve top 20 results excluding those author_id
s and responses that reply to them.
Determining whether a row should be excluded is pretty complex, and while it would probably be possible to do this in the database (either in PL/SQL or by preprocessing), I want to keep the logic within the application. So I can do one of two things:
What is the practical difference between the two? Esp. in terms of performance with many simultaneous users.
I'm doing this in PostgreSQL, but I'm willing to switch to a different RDBMS. (i don't want to lose referential integrity, so i'm not looking into NoSQL solutions) Perhaps I'd have to tune some parameters of the database (such as prefetch sizes), to make the most of the unbounded query case?
I can't speak to the specifics of Postgres, but it's possible that the query optimizer will use the LIMIT clause as part of the costing of various different execution plans.
If you ...
select ... from ... where ... limit n
then the optimiser knows that you are only going to retrieve n rows, but for ...
select ... from ... where ...
the optimiser might assume that you want the entire result set, which may be estimated at several thousand rows.
In particular I'd expect an RDBMS to favour index-based access methods where LIMIT clauses are applied.
Adding a block-list in SQL isn't difficult.
SELECT * FROM responses
WHERE topic_id=123
AND author_id NOT IN (SELECT author_id FROM blocked WHERE user_id = X)
ORDER BY id DESC LIMIT 20;
Just add a NOT IN to your WHERE clause.
If you have some reason you can't do this then your chunk idea is best. You don't want to have no limit because then the database will return everything to the client or server querying it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With