Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently get a range of ranked users (for a leaderboard) using Postgresql

Tags:

sql

postgresql

I have read many posts on this topic, such as mysql-get-rank-from-leaderboards.

However, none of the solutions are efficient at scale for getting a range of ranks from the database.

The problem is simple. Suppose we have a Postgres table with an "id" column and another INTEGER column whose values are not unique, but we have an index for this column.

e.g. table could be:

CREATE TABLE my_game_users (id serial PRIMARY KEY, rating INTEGER NOT NULL);

The goal

  • Define a rank for users ordering users on the "rating" column descending
  • Be able to query for a list of ~50 users ordered by this new "rank", centered at any particular user
  • For example, we might return users with ranks { 15, 16, ..., 64, 65 } where the center user has rank #40
  • Performance must scale, e.g. be under 80 ms for 100,000 users.

Attempt #1: row_number() window function

WITH my_ranks AS 
  (SELECT my_game_users.*, row_number() OVER (ORDER BY rating DESC) AS rank
   FROM my_game_users)
SELECT *
FROM my_ranks
WHERE rank >= 4000 AND rank <= 4050
ORDER BY rank ASC;

This "works", but the queries average 550ms with 100,000 users on a fast laptop without any other real work being done.

I tried adding indexes, and re-phrasing this query to not use the "WITH" syntax, and nothing worked to speed it up.

Attempt #2 - count the number of rows with a greater rating value I tried a query like this:

SELECT  t1.*,
  (SELECT  COUNT(*)
   FROM my_game_users t2
   WHERE (t1.rating, -t1.id) <= (t2.rating, -t2.id)
  ) AS rank
FROM my_game_users t1
WHERE id = 2000;

This is decent, this query takes about 120ms with 100,000 users having random ratings. However, this only returns the rank for user with a particular id (2000).

I can't see any efficient way to extend this query to get a range of ranks. Any attempt at extending this makes a very slow query.

I only know the ID of the "center" user, since the users have to be ordered by rank before we know which ones are in the range!

Attempt #3: in-memory ordered Tree

I ended up using a Java TreeSet to store the ranks. I can update the TreeSet whenever a new user is inserted into the database, or a user's rating changes.

This is super fast, around 25 ms with 100,000 users.

However, it has a serious drawback that it's only updated on the Webapp node that serviced the request. I'm using Heroku and will deploy multiple nodes for my app. So, I needed to add a scheduled task for the server to re-build this ranking tree every hour, to make sure the nodes don't get too out-of-sync!

If anyone knows of an efficient way to do this in Postgres with full solution, then I am all ears!

like image 309
Charles Capps Avatar asked Aug 02 '15 00:08

Charles Capps


2 Answers

You can get the same results by using order by rating desc and offset and limit to get users between a certain rank.

WITH my_ranks AS 
    (SELECT my_game_users.*, row_number() OVER (ORDER BY rating DESC) AS rank FROM my_game_users)
SELECT * FROM my_ranks WHERE rank >= 4000 AND rank <= 4050 ORDER BY rank ASC;

The query above is the same as

select * , rank() over (order by rating desc) rank 
from my_game_users 
order by rating desc
limit 50 offset 4000

If you want to select users around rank #40 you could select ranks #15-#65

select *, rank() over (order by rating desc) rank 
from my_game_users 
order by rating desc
limit 50 offset 15
like image 185
FuzzyTree Avatar answered Oct 02 '22 23:10

FuzzyTree


Thanks, @FuzzyTree ! Your solution doesn't quite give me everything I need, but it nudged me in the right direction. Here's the full solution I'm going with for now.

The only limitation with your solution is that there's no way to get a unique rank for a particular user. All users with the same rating would have the same rank (or at least it is undefined by SQL standard). If I knew the OFFSET ahead of time, then your rank would be good enough, but I have to get the rank of a particular user first.

My solution is to do the following query to get a range of ranks:

SELECT * FROM my_game_users ORDER BY rating DESC, id ASC LIMIT ? OFFSET ?

This is basically uniquely defining the ranks by rating, then by who joined the Game first (lower id). To make this efficient I'm creating an index on (rating DESC, id)

Then, I'm getting a particular user's rank to plug in to this query with:

SELECT COUNT(*) FROM my_game_users WHERE rating > ? OR (rating = ? AND id < ?)

I actually made this more efficient with:

SELECT (SELECT COUNT(*) FROM my_game_users WHERE rating > ?) + (SELECT COUNT(*) FROM my_game_users WHERE rating = ? AND id < ?) + 1

Now, even with these queries it takes about 78ms average and median time to get the ranks around a user. If anyone has a good idea how to speed these up I'm all ears!

For example, getting a range of ranks takes about 60ms, and explaining it yields:

EXPLAIN SELECT * FROM word_users ORDER BY rating DESC, id ASC LIMIT 50 OFFSET 50000;

"Limit (cost=6350.28..6356.63 rows=50 width=665)" " -> Index Scan using idx_rating_desc_and_id on word_users (cost=0.29..12704.83 rows=100036 width=665)"

So, it's using the rating and id index, yet it still has this highly variable cost from 0.29...12704.83. Any ideas how to improve??

like image 25
Charles Capps Avatar answered Oct 02 '22 23:10

Charles Capps