Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scaling a High Score Database

Tags:

mysql

I have a simple high score service for an online game, and it has become more popular than expected. The high score is a webservice which uses a MYSQL backend with a simple table as shown below. Each high score record is stored as a row in this table. The problem is, with >140k rows, I see certain key queries slowing down so much that it will soon be too slow to service requests.

The main table looks like this:

  • id is a unique key for each score record
  • game is the ID number of the game which submitted the score (currently, always equal to "1", soon will have to support more games though)
  • name is the display name for that player's submission
  • playerId is a unique ID for a given user
  • score is a numeric score representation ex 42,035
  • time is the submission time
  • rank is a large integer which uniquely sorts the score submissions for a given game. It is common for people to tie at a certain score, so in that case the tie is broken by who submitted first. Therefore this field's value is equal roughly to "score * 100000000 + (MAX_TIME - time)"
+----------+---------------+------+-----+---------+----------------+
| Field    | Type          | Null | Key | Default | Extra          |
+----------+---------------+------+-----+---------+----------------+
| id       | int(11)       | NO   | PRI | NULL    | auto_increment |
| game     | int(11)       | YES  | MUL | NULL    |                |
| name     | varchar(100)  | YES  |     | NULL    |                |
| playerId | varchar(50)   | YES  |     | NULL    |                |
| score    | int(11)       | YES  |     | NULL    |                |
| time     | datetime      | YES  |     | NULL    |                |
| rank     | decimal(50,0) | YES  | MUL | NULL    |                |
+----------+---------------+------+-----+---------+----------------+

The indexes look like this:

+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table     | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| pozscores |          0 | PRIMARY  |            1 | id          | A         |      138296 |     NULL | NULL   |      | BTREE      |         |
| pozscores |          0 | game     |            1 | game        | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         |
| pozscores |          0 | game     |            2 | rank        | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         |
| pozscores |          1 | rank     |            1 | rank        | A         |      138296 |     NULL | NULL   | YES  | BTREE      |         |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

When a user requests high scores, they typically request around 75 high scores from an arbitrary point in the "sorted by rank descending list". These requests are typically for "alltime" or just for scores in the past 7 days.

A typical query looks like this: "SELECT * FROM scoretable WHERE game=1 AND time>? ORDER BY rank DESC LIMIT 0, 75;" and runs in 0.00 sec.

However, if you request towards the end of the list "SELECT * FROM scoretable WHERE game=1 AND time>? ORDER BY rank DESC LIMIT 10000, 75;" and runs in 0.06 sec.

"SELECT * FROM scoretable WHERE game=1 AND time>? ORDER BY rank DESC LIMIT 100000, 75;" and runs in 0.58 sec.

It seems like this will quickly start taking way too long as several thousand new scores are submitted each day!

Additionally, there are two other types of queries, used to find a particular player by id in the rank ordered list. They look like this:

"SELECT * FROM scoretable WHERE game=1 AND time>? AND playerId=? ORDER BY rank DESC LIMIT 1"

followed by a

"SELECT count(id) as count FROM scoretable WHERE game=1 AND time>? AND rank>[rank returned from above]"

My question is: What can be done to make this a scalable system? I can see the number of rows growing to be several million very soon. I was hoping that choosing some smart indexes would help, but the improvement has only been marginal.

Update: Here is an explain line:

mysql> explain SELECT * FROM scoretable WHERE game=1 AND time>0 ORDER BY rank DESC LIMIT 100000, 75;
+----+-------------+-----------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type | table     | type  | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-----------+-------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | scoretable| range | game          | game | 5       | NULL | 138478 | Using where |
+----+-------------+-----------+-------+---------------+------+---------+------+--------+-------------+

Solution Found!

I have solved the problem thanks to some of the pointers from this thread. Doing a clustered index was exactly what I needed, so I converted the table to use InnoDB in mysql, which supports clustered indexes. Next, I removed the id field, and just set the primary key to be (game ASC, rank DESC). Now, all queries run super fast, no matter what offset I use. The explain shows that no additional sorting is being done, and it looks like it is easily handling all the traffic.

like image 415
Jake Poznanski Avatar asked Feb 01 '11 03:02

Jake Poznanski


1 Answers

Seeing as how there are no takers, I'll give it a shot. I am from an SQL Server background, but the same ideas apply.

Some general observations:

  • The ID column is pretty much pointless, and should not participate in any indexes unless there are other tables/queries you're not telling us about. In fact, it doesn't even need to be in your last query. You can do COUNT(*).
  • Your clustered index should target your most common queries. Therefore, a clustered index on game ASC, time DESC, and rank DESC works well. Sorting by time DESC is usually a good idea for historical tables like this where you are usually interested in the most recent stuff. You may also try a separate index with the rank sorted the other direction, though I'm not sure how much of a benefit this will be.
  • Are you sure you need SELECT *? If you can select fewer columns, you may be able to create an index which contains all columns needed for your SELECT and WHERE.

1 million rows is really not that much. I created a table like yours with 1,000,000 rows of sample data, and even with the one index (game ASC, time DESC, and rank DESC), all queries ran in less than 1 second.

(The only part I'm not sure of is playerId. The queries performed so well that playerId didn't seem to be necessary. Perhaps you can add it at the end of your clustered index.)

like image 56
anon Avatar answered Oct 12 '22 22:10

anon