Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing a Leaderboard

Users on my site create annotations for rap lyrics (example). I want to create a leaderboard to reward the people who create the most annotations.

The leaderboard should track how many annotations each user has created overall, as well as how many he has created in the past week, day, etc.

I have no problem implementing the overall leaderboard:

@users = User.all

<table>
  <tr>
    <th>Contributor</th>
    <th>Annotations</th>
  </tr>
    <% @users.sort_by{|u| u.annotations.size }.reverse.each do |u| %>
      <tr>
        <td><%= u %></td>
        <td><%= u.annotations.size %></td>
      </tr>
    <% end %>
</table>

But when I try to implement (say) the daily scoreboard, I'm repeating code and the operation is painfully slow (because it has to iterate through every annotation in memory rather than relying on database sorting / counting):

<table>
  <tr>
    <th>Contributor</th>
    <th>Annotations</th>
  </tr>
    <% @users.sort_by{|u| u.annotations.select{|a| a.created_at > 1.day.ago }.size }.reverse.each do |u| %>
      <tr>
        <td><%= u %></td>
        <td><%= u.annotations.select{|a| a.created_at > 1.day.ago }.size %></td>
      </tr>
    <% end %>
</table>

What's the best way to implement a daily / weekly scoreboard?

like image 256
Tom Lehman Avatar asked Nov 30 '22 19:11

Tom Lehman


2 Answers

Leaderboards as a whole are a pain to implement. Well, in my experience, the actual implementation is fairly straight-forward its just that they get hard to scale. Often you find yourself having to run many DB queries which are pretty DB intensive. To handle daily/weekly reports one would likely query on a datetime column but then it implies that you have an index on said column. That index is really only useful for the leaderboard queries and it makes all other writes on that table pay the price because the index has to be recalculated.

Another approach is to generate your statistics on a scheduled interval and you write that data to a separate table, which is used by the leaderboard queries. For example, you have a background job that runs every night you run a query (maybe its an expensive one because it doesnt use the datetime index but since its run only once and via a background job the expense is "ok"), that query in turn writes to a statistics table that does have an index on the datetime column, then you rewrite your leaderboard page to hit your pre-calculated statistics. Depending on your needs you might have that cron script also do other data munging and pre-calculation so the leaderboard page has to do as little as computation as necessary.

At this point you have your leaderboard page working and while it hits a table with an index it will still be having to read a large number of rows. This is assuming you have decent traffic. Having an indexed query hit a large number of rows on every page is still expensive. So now you think about implementing page caching, maybe storing data in memcached. That is, since the daily leaderboard data changes at minimum every day, by definition, its expensive to re-run those DB queries on every page view. It makes more sense to cache that daily data in memcached and each page view only hits memcached.

So as you can see its an evolved process. If your traffic is low than you might get away with having no separate table and just having an index on your datetime column. Running sums, counts and averages might be OK. But it doesnt scale. So you then have to think about breaking it out into a more optimized structure. And then you see that running the same query over and over every day while the underlying data doesnt change in 24 hours is expensive so you move to a caching setup. There are lots of moving parts and it can get complicated, well, really just tedious fast.

I am a battle-hardended cynic when it comes to leaderboards and while they are great for game mechanics and motivating people (everyone loves to see a score!) its a pain in the ass to make work at a large scale.

like image 83
Cody Caughlan Avatar answered Dec 06 '22 23:12

Cody Caughlan


Have you considered keeping these statistics in a separate table/model that gets updated by an observer? You're doing a lot of heavy lifting in a view here, which isn't generally good practice.

like image 42
jmcnevin Avatar answered Dec 07 '22 00:12

jmcnevin