We have a MyISAM table with approximately 75 milion rows that has 5 columns:
id (int),
user_id(int),
page_id (int),
type (enum with 6 strings)
date_created(datetime).
We have a primary index on the ID column, a unique index (user_id, page_id, date_created) AND a composite index (page_id, date_created)
The problem is that the query below takes up to 90 seconds to complete
SELECT SQL_NO_CACHE user_id, count(id) nr
FROM `table`
WHERE `page_id`=301
and `date_created` BETWEEN '2012-01-03' AND '2012-02-03 23:59:59'
AND page_id<>user_id
group by `user_id`
This is the explain of this query
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | table | range | page_id | page_id | 12 | NULL | 520024 | Using where; Using temporary; Using filesort |
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
EDIT: At the suggestion of ypercube I tried adding a new index (page_id, user_id, date_created). However mysql does not use it bu default so i had to suggest it to the query optimizer. Here is the new query and the explain:
SELECT SQL_NO_CACHE user_id, count(*) nr FROM `table` USE INDEX (usridexp) WHERE `page_id`=301 and `date_created` BETWEEN '2012-01-03' AND '2012-02-03 23:59:59' AND page_id<>user_id group by `user_id` ORDER BY NULL
+----+-------------+----------------------------+------+---------------+----------+---------+-------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------------+------+---------------+----------+---------+-------+---------+--------------------------+
| 1 | SIMPLE | table | ref | usridexp | usridexp | 4 | const | 3943444 | Using where; Using index |
+----+-------------+----------------------------+------+---------------+----------+---------+-------+---------+--------------------------+
Some changes that may improve the query:
Change COUNT(id)
to COUNT(*)
. Since id
is (I guess) the PRIMARY KEY and NOT NULL
, the results will be identical.
Add an ORDER BY NULL
after ther GROUP BY
clause. In MySQL, a group by operation also sorts the results, unless you specify other wise.
The (page_id, date_created)
is probably the best index that MySQL can use for this query but you could also try (page_id, user_id, date_created)
(can you also post the EXPLAIN if you add this index?)
Another thing not related to the performance of this query:
If your (user_id, page_id, date_created)
is UNIQUE
and the id
is auto generated (and not used for anything else but as a Primary Key), you can make it the PRIMARY KEY
and drop the id
column. One less index and 4 bytes less per row.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With