Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Category with lot of pages (huge offsets) (how does stackoverflow work?)

Tags:

database

mysql

I think that my question can be solved by just knowing how, for example, stackoverflow works.

For example, this page, loads in a few ms (< 300ms): https://stackoverflow.com/questions?page=61440&sort=newest

The only query i can think about for that page is something like SELECT * FROM stuff ORDER BY date DESC LIMIT {pageNumber}*{stuffPerPage}, {pageNumber}*{stuffPerPage}+{stuffPerPage}

A query like that might take several seconds to run, but the stack overflow page loads almost in no time. It can't be a cached query, since that question are posted over time and rebuild the cache every time a question is posted is simply madness.

So, how do this works in your opinion?

(to make the question easier, let's forget about the ORDER BY) Example (the table is fully cached in ram and stored in an ssd drive)

mysql> select * from thread limit 1000000, 1;
1 row in set (1.61 sec)

mysql> select * from thread limit 10000000, 1;
1 row in set (16.75 sec)

mysql> describe select * from thread limit 1000000, 1;
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows     | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+
|  1 | SIMPLE      | thread | ALL  | NULL          | NULL | NULL    | NULL | 64801163 |       |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+

mysql> select * from thread ORDER BY thread_date DESC limit 1000000, 1;
1 row in set (1 min 37.56 sec)


mysql> SHOW INDEXES FROM thread;
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| thread |          0 | PRIMARY  |            1 | newsgroup_id | A         |      102924 |     NULL | NULL   |      | BTREE      |         |               |
| thread |          0 | PRIMARY  |            2 | thread_id    | A         |    47036298 |     NULL | NULL   |      | BTREE      |         |               |
| thread |          0 | PRIMARY  |            3 | postcount    | A         |    47036298 |     NULL | NULL   |      | BTREE      |         |               |
| thread |          0 | PRIMARY  |            4 | thread_date  | A         |    47036298 |     NULL | NULL   |      | BTREE      |         |               |
| thread |          1 | date     |            1 | thread_date  | A         |    47036298 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
5 rows in set (0.00 sec)
like image 280
cedivad Avatar asked Aug 20 '11 11:08

cedivad


1 Answers

Create a BTREE index on date column and the query will run in a breeze.

CREATE INDEX date ON stuff(date) USING BTREE

UPDATE: Here is a test I just did:

CREATE TABLE test( d DATE, i INT, INDEX(d) );

Filled the table with 2,000,000 rows with different unique is and ds

mysql> SELECT * FROM test LIMIT 1000000, 1;
+------------+---------+
| d          | i       |
+------------+---------+
| 1897-07-22 | 1000000 |
+------------+---------+
1 row in set (0.66 sec)

mysql> SELECT * FROM test ORDER BY d LIMIT 1000000, 1;
+------------+--------+
| d          | i      |
+------------+--------+
| 1897-07-22 | 999980 |
+------------+--------+
1 row in set (1.68 sec)

And here is an interesiting observation:

mysql> EXPLAIN SELECT * FROM test ORDER BY d LIMIT 1000, 1;
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+
|  1 | SIMPLE      | test  | index | NULL          | d    | 4       | NULL | 1001 |       |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+

mysql> EXPLAIN SELECT * FROM test ORDER BY d LIMIT 10000, 1;
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+
|  1 | SIMPLE      | test  | ALL  | NULL          | NULL | NULL    | NULL | 2000343 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+

MySql does use the index for OFFSET 1000 but not for 10000.

Even more interesting, if I do FORCE INDEX query takes more time:

mysql> SELECT * FROM test FORCE INDEX(d) ORDER BY d LIMIT 1000000, 1;
+------------+--------+
| d          | i      |
+------------+--------+
| 1897-07-22 | 999980 |
+------------+--------+
1 row in set (2.21 sec)
like image 145
nobody Avatar answered Nov 15 '22 09:11

nobody