Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it better to maintain a separate count table vs running count query every time?

I'm building an social application, which has follow/following concept similar to twitter.

From a performance point of view to find no.of followers and following users, Is it better to maintain a separate table for the counts? or just do an count query every time?

Update:

Similarly i have a Survey sort of functionality where people can vote, People can only Vote Yes or No. Right now i'm storing the votes in a separate table. And i need to show list of surveys with no of participants, no of yes's and no of no's on my homepage.

Similar to stackoverflow home page (where they show count of votes, answers and views).

like image 616
firefly Avatar asked Aug 23 '11 11:08

firefly


Video Answer


2 Answers

This, as most things, depends on the access patterns, i.e. the way your system will be used. If updating will be your main bottleneck, then you should not incure an added overhead by having to maintain a counter. If on the other hand, when accessing data having the count ready will save you considerable time or it would just not be feasible to count every time, then you should precaluclate it.

As a general guideline, do not add tables, like the seperate count table you propose, that are there purely for performance optimizations before you actually measured the performance to be a problem. Having a seperate count table breaks normalization (as any kind of caching does, since the data is now replicated in two places) and will make the code more complicated, hence it should not be done just because the count might be needed.

(All that said, some databases support materialized views / materialized queries that allow you to easily do this kind of caching transparently in the background. Those materialized tables are updated by the database, so the program code does not have to worry about it and also, depending on the sophistication of the query optimizer, can be used to optimize a query transparently.)

Update: The No/Yes vote question is a bit different, as the main purpose is to just track the count, not necessarily the whole information (i.e. who voted yes). So a valid implementation might be to just keep track of the accumulated number of yes and no votes. However, the more information you store (i.e. who voted yes, not just ow many) the more you can do with it if you chose to do so (for instance, in Stackoverflow I can always remove my upvote - something you could not do if you did not track who voted). Again I would advice against aggregating to early, in this case, because you will lose certain information.

like image 79
Janick Bernet Avatar answered Sep 27 '22 19:09

Janick Bernet


It depends.

If you have many users, the count could be quite long and load big parts of the table/indexes into memory.

If you do a triger then you'll loose some time in the wrting process, so every following action triggered will be a litlle slower.

A mix between the two, asynchronously feeding a statistic table about followers may give you the best results (fast in write operations, extremly fast when reading).

like image 21
regilero Avatar answered Sep 27 '22 21:09

regilero