You land on a page full of comments. The voting system of the comments is highlighted with your votes.
To support this requirement, the database schema would be the following, at a minimum:
Page
Comments
Votes
If the page ID were 123 and the user ID were 456, this would be the naive implementation of the controller:
1) Query for all the votes made by user 456 on the comments of page 123:
SELECT c.commentId, v.direction
FROM comments AS c, votes AS v
WHERE c.pageId = 123
AND c.commentId = v.commentId
AND v.userId = 456
2) Construct the view with the results of this query.
The database query to support this voting system is very expensive. The comments and votes table will be huge. On a high traffic site, thousands of users will be executing this query every second to receive a personalized view of the comment voting. How do you scale this voting system so the database does not become overloaded with too many requests? Would you cache it in memory? Isn't it bet practice to only cache things common to a large audience? In this case, these queries are specific to single users. Memory would quickly be filled up on a website with millions of users. Cache misses would occur and the database would be thrashed.
I think Reddit will cache/store (for each comment) the list of users that up voted it (and another for down votes) and only update this cache every X seconds/minutes/hours depending on activity. The list will be organised such that a binary search is possible.
Then when generating the page the server only needs to say check whether the current user id is "in the list" of up/down voters for each comment. Reddit also limits the number of comments initially visible, which will reduce the number of tests required.
Reddit also doesn't update votes immediately (they add the vote to a queue). They can tie together the processing of the queue and the caching of the votes.
I assume reddit must also keep track of recent votes on a per user basis so that they can fill in the gaps between the last cache update and now.
This might not be 100% accurate.
It is based on limited reading about Reddit architecture and what I would do.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With