I have some performance issues with a big table on Mysql : The table has got 38 million rows, its size is 3GB. I want to select by testing 2 columns : I tried many indexing (one index for each columns and one index with the 2 columns) but I still have slow query : like below, more than 4 secs to fetch 1644 rows :
SELECT `twstats_twwordstrend`.`id`, `twstats_twwordstrend`.`created`, `twstats_twwordstrend`.`freq`, `twstats_twwordstrend`.`word_id` FROM `twstats_twwordstrend` WHERE (`twstats_twwordstrend`.`word_id` = 1001 AND `twstats_twwordstrend`.`created` > '2011-11-07 14:01:34' );
...
...
...
1644 rows in set (4.66 sec)
EXPLAIN SELECT `twstats_twwordstrend`.`id`, `twstats_twwordstrend`.`created`, `twstats_twwordstrend`.`freq`, `twstats_twwordstrend`.`word_id` FROM `twstats_twwordstrend` WHERE (`twstats_twwordstrend`.`word_id` = 1001 AND `twstats_twwordstrend`.`created` > '2011-11-07 14:01:34' );
+----+-------------+----------------------+-------+-----------------------------------------------------+-----------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+-----------------------------------------------------+-----------------------+---------+------+------+-------------+
| 1 | SIMPLE | twstats_twwordstrend | range | twstats_twwordstrend_4b95d890,word_id_created_index | word_id_created_index | 12 | NULL | 1643 | Using where |
+----+-------------+----------------------+-------+-----------------------------------------------------+-----------------------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> describe twstats_twwordstrend;
+---------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| created | datetime | NO | | NULL | |
| freq | double | NO | | NULL | |
| word_id | int(11) | NO | MUL | NULL | |
+---------+----------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
mysql> show index from twstats_twwordstrend;
+----------------------+------------+-------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------------+------------+-------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| twstats_twwordstrend | 0 | PRIMARY | 1 | id | A | 38676897 | NULL | NULL | | BTREE | | |
| twstats_twwordstrend | 1 | twstats_twwordstrend_4b95d890 | 1 | word_id | A | 655540 | NULL | NULL | | BTREE | | |
| twstats_twwordstrend | 1 | word_id_created_index | 1 | word_id | A | 257845 | NULL | NULL | | BTREE | | |
| twstats_twwordstrend | 1 | word_id_created_index | 2 | created | A | 38676897 | NULL | NULL | | BTREE | | |
+----------------------+------------+-------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.03 sec)
I also discovered that fetching only one row far away in the table is very slow :
mysql> SELECT `twstats_twwordstrend`.`id`, `twstats_twwordstrend`.`created`, `twstats_twwordstrend`.`freq`, `twstats_twwordstrend`.`word_id` FROM `twstats_twwordstrend` limit 10000000,1;
+----------+---------------------+--------------------+---------+
| id | created | freq | word_id |
+----------+---------------------+--------------------+---------+
| 10000001 | 2011-09-09 15:59:18 | 0.0013398539559188 | 41295 |
+----------+---------------------+--------------------+---------+
1 row in set (1.73 sec)
... and not slow at the beginning of the table:
mysql> SELECT `twstats_twwordstrend`.`id`, `twstats_twwordstrend`.`created`, `twstats_twwordstrend`.`freq`, `twstats_twwordstrend`.`word_id` FROM `twstats_twwordstrend` limit 1,1;
+----+---------------------+---------------------+---------+
| id | created | freq | word_id |
+----+---------------------+---------------------+---------+
| 2 | 2011-06-16 10:59:06 | 0.00237777777777778 | 2 |
+----+---------------------+---------------------+---------+
1 row in set (0.00 sec)
The table uses Innodb engine. How can I speed-up queries for big tables ?
The main thing you can do is to add indexes.
Any time that you use a column in a where clause, make sure it has an index. There isn't one on your created column.
The multi-index including the created column in essence is NOT an index on created since created isn't first in the multi-index.
When using multi-indexes, you should almost always put the column with higher cardinality first. So, having the indexes be: (created, word_id)
, (word_id)
would give you a significant boost.
A query with LIMIT 10000000,1
will always be very slow, because it needs to fetch more than 10 million rows (it ignores all except the last one). If your application needs such a query regularly, consider a redesign.
Tables do not have a "beginning" and "end"; they aren't inherently ordered.
It looks to me like you need an index on (word_id
, created
).
You should DEFINITELY performance-test this on a non-production server with production-grade hardware.
Incidentally, a 3Gb database isn't too big nowadays, it will fit in RAM on all but the smallest of servers (You are running a 64-bit OS, right, and have tuned innodb_buffer_pool appropriately? Or your sysadmin did?).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With