I have a simple high score service for an online game, and it has become more popular than expected. The high score is a webservice which uses a MYSQL backend with a simple table as shown below. Each high score record is stored as a row in this table. The problem is, with >140k rows, I see certain key queries slowing down so much that it will soon be too slow to service requests.
The main table looks like this:
+----------+---------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------+---------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | game | int(11) | YES | MUL | NULL | | | name | varchar(100) | YES | | NULL | | | playerId | varchar(50) | YES | | NULL | | | score | int(11) | YES | | NULL | | | time | datetime | YES | | NULL | | | rank | decimal(50,0) | YES | MUL | NULL | | +----------+---------------+------+-----+---------+----------------+
The indexes look like this:
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | pozscores | 0 | PRIMARY | 1 | id | A | 138296 | NULL | NULL | | BTREE | | | pozscores | 0 | game | 1 | game | A | NULL | NULL | NULL | YES | BTREE | | | pozscores | 0 | game | 2 | rank | A | NULL | NULL | NULL | YES | BTREE | | | pozscores | 1 | rank | 1 | rank | A | 138296 | NULL | NULL | YES | BTREE | | +-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
When a user requests high scores, they typically request around 75 high scores from an arbitrary point in the "sorted by rank descending list". These requests are typically for "alltime" or just for scores in the past 7 days.
A typical query looks like this:
"SELECT * FROM scoretable WHERE game=1 AND time>? ORDER BY rank DESC LIMIT 0, 75;"
and runs in 0.00 sec.
However, if you request towards the end of the list
"SELECT * FROM scoretable WHERE game=1 AND time>? ORDER BY rank DESC LIMIT 10000, 75;"
and runs in 0.06 sec.
"SELECT * FROM scoretable WHERE game=1 AND time>? ORDER BY rank DESC LIMIT 100000, 75;"
and runs in 0.58 sec.
It seems like this will quickly start taking way too long as several thousand new scores are submitted each day!
Additionally, there are two other types of queries, used to find a particular player by id in the rank ordered list. They look like this:
"SELECT * FROM scoretable WHERE game=1 AND time>? AND playerId=? ORDER BY rank DESC LIMIT 1"
followed by a
"SELECT count(id) as count FROM scoretable WHERE game=1 AND time>? AND rank>[rank returned from above]"
My question is: What can be done to make this a scalable system? I can see the number of rows growing to be several million very soon. I was hoping that choosing some smart indexes would help, but the improvement has only been marginal.
Update: Here is an explain line:
mysql> explain SELECT * FROM scoretable WHERE game=1 AND time>0 ORDER BY rank DESC LIMIT 100000, 75; +----+-------------+-----------+-------+---------------+------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+-------+---------------+------+---------+------+--------+-------------+ | 1 | SIMPLE | scoretable| range | game | game | 5 | NULL | 138478 | Using where | +----+-------------+-----------+-------+---------------+------+---------+------+--------+-------------+
Solution Found!
I have solved the problem thanks to some of the pointers from this thread. Doing a clustered index was exactly what I needed, so I converted the table to use InnoDB in mysql, which supports clustered indexes. Next, I removed the id field, and just set the primary key to be (game ASC, rank DESC). Now, all queries run super fast, no matter what offset I use. The explain shows that no additional sorting is being done, and it looks like it is easily handling all the traffic.
Seeing as how there are no takers, I'll give it a shot. I am from an SQL Server background, but the same ideas apply.
Some general observations:
1 million rows is really not that much. I created a table like yours with 1,000,000 rows of sample data, and even with the one index (game ASC, time DESC, and rank DESC), all queries ran in less than 1 second.
(The only part I'm not sure of is playerId. The queries performed so well that playerId didn't seem to be necessary. Perhaps you can add it at the end of your clustered index.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With