Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How To Sort Like Hacker News

I am trying to program a plugin to bbPress (the open source forum software) that will work similar to Hacker News (http://news.ycombinator.com/).

Specifically, I want to sort the order of forum-threads (bbPress calls them "topics") using the following algorithm:

sort_value = (p - 1) / (t + 2)^1.5
where p = total votes for each topic from users
t = time since submission of each topic in hours

I would like to be able to sort topics by this calculated sort_value using MySQL.

The relevant fields in the topics table looks something like this:

topic_id            bigint(20)
topic_start_time    datetime

This is up in the air, but I was thinking there will be another table that stores individual votes by users so we'll be able to know whether a user has voted already. And another table will store the current vote-totals for each topic. Maybe there will be another field in that table storing the latest calculated sort_Value?

To be 100% accurate, the sort_value should be updated after each new vote. This would add too much load to the database server, though, especially if we tried to update ALL the topics. If we have to, we could limit the dataset by only calculating the sort_value for the last X # of topics. We could also limit the load by only updating the sort_value periodically (e.g. every 5 minutes via a cron job).

These shortcuts might make the load acceptable, but I would prefer a more elegant solution that could scale better.

How would you structure this? :-)

like image 874
bobbyh Avatar asked Jun 04 '09 00:06

bobbyh


1 Answers

There are a number of tradeoffs to consider in this. You've hinted at them already in your question. Timeliness and Exactness vs Load and Scale.

Batching the calculations is the best way to decrease Load and increase Scale if Timeliness and Exactness aren't necessary and the system experiences a high load of writes.

You really have to sort of examine the the useage of the system and determine what areas you need to optimize for. Optimizing for Write has different constraints than optimizing for Reads. Same for timeliness or Exactness of the data.

Determine which ones are most important for your application and make the appropriate tradeoff.

like image 62
Jeremy Wall Avatar answered Nov 04 '22 14:11

Jeremy Wall