Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Reddit track top posts

Tags:

reddit

Reddit has different buckets for Top posts. They have "This Hour", "Today", "This Week", "This Month" "This Year" "All Time". The best way I can think of to create these lists would be to save each vote with a timestamp so that you can calculate the score of a post for each bucket. This would be an expensive query but they could get away with it since Top is the same for all users and doesn't change very much so they could cache the query results.

This is just my best guess of what's going on but I'm curious, is this what Reddit is actually doing or is there a better way?

like image 460
tleef Avatar asked Jan 30 '14 06:01

tleef


People also ask

Does Reddit track what you view?

While you're Anonymous Browsing, Reddit won't: Save your browsing or search history to your Reddit account. Use your Reddit activity to personalize your recommendations. Use your Reddit activity to send you personalized notifications.

Does Reddit have analytics?

Reddit analytics and metricsAs of March, Reddit will now provide OPs and moderators with the following metrics: Total Post Views (with hourly views in the first 48 hours) Upvote Rate. Community Karma.

How do I see my top posts on Reddit profile?

You can visit the Reddit Recap 2021 page directly or check your Reddit inbox for a message with a link to the page itself. For those who use the official Reddit app, click on your profile and click on the "Recap" thought bubble above your avatar.


1 Answers

First off, "this hour", "today", "this week", etc. all refer to when the submission (link/comment) was created, not when the votes happened. I'll focus on links here, but comments are similarly processed for display on user pages.

Short answer: a bunch of cron jobs pull the relevant time period, sort the links and group them by subreddit, then store cached lists of links for quick perusal.

To elaborate, for each time period, there's a different cron job. The "top this hour" job runs much more frequently than the "top this year" job for example. The first thing each job does is pull down a list of all links from the database that were created in the time period of interest. This gets dumped out to a text file where a primitive map-reduce system processes the data. The links are grouped and sorted. The final list of results is then put into Cassandra as a simple list of link IDs which are very quick to look up in-request.

Source: https://github.com/reddit/reddit/blob/master/scripts/compute_time_listings

FWIW, individual votes do have timestamps attached to them, but they're not directly used for tracking Top.

like image 102
Neil Williams Avatar answered Sep 21 '22 11:09

Neil Williams