I have two tables in a one-to-many relationship. Let's say that for each row in table foo
, there can be 0 or more rows in table bar
that reference the row in foo
.
The client wants to know how many rows in bar
reference a row in foo
, for all the rows in foo
.
I can accomplish this with the following query:
SELECT count(bar_id) FROM bar WHERE bar.foo_id = foo.foo_id;
However, what if the tables foo
and bar
were large? Say foo
has 1 million rows, and bar
has 10 million rows. Let's also say that 99% of rows in foo
would have a count of less than 1,000 bar
rows referencing it. Let's say that the client typically asks for around 100 rows of foo
at a time.
Should I use the naive count() query with an index on the foreign key, or would it be better to keep a counter? Is it even possible to keep a counter? By updating the counter with atomic increments and decrements using a trigger on bar
, I believe it's possible, but I could be wrong.
Perhaps counter-intuitively, you'll probably find that the simple count
approach is faster unless your workload is very biased towards reads.
The reason for this is that the effect of the counter table will be to serialize updates, so only one transaction that's updating a given foo
can be in flight at any given time. That's because the update for the trigger that updates the counter will take a lock on that foo
's entry in the counter table and won't release it until the transaction rolls back or commits.
Worse, if your transaction affects more than one foo
and so does another one, you have a high chance of one of the transactions being aborted due to a deadlock.
Stick to a simple count until you have a good reason to change it.
The sweet thing about indices is that they offer logarithmic complexity for querying operations. Thus, for 10*10^6
rows, the index just needs about ln(10*10^6)=16.1
comparisons to find one specific id. Make it 100 million rows, and you only have to do 2 to 3 comparisons more. In short: The index does not that much care about the size of a table.
Of course, you may still be able to archive some performance gains using a stored counter. That is a typical tradeoff. Maintaining the counter will make insertions and deletions to bar
much more expensive and make your counting query a little bit cheaper.
Thus, if your tables are altered rarely and the query is run frequently (say, thousands of times an hour), you might really gain performance using the stored counter procedure. However, in most cases I would say go for the indexed column and let the database do the rest for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With