Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i speed up a group by query that already uses indexes?

Tags:

mysql

We have a MyISAM table with approximately 75 milion rows that has 5 columns:

id (int), 
user_id(int), 
page_id (int), 
type (enum with 6 strings)
date_created(datetime).

We have a primary index on the ID column, a unique index (user_id, page_id, date_created) AND a composite index (page_id, date_created)

The problem is that the query below takes up to 90 seconds to complete

SELECT SQL_NO_CACHE user_id, count(id) nr 
FROM `table` 
WHERE `page_id`=301 
and `date_created` BETWEEN '2012-01-03' AND '2012-02-03 23:59:59' 
AND page_id<>user_id 
group by `user_id`

This is the explain of this query

+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
| id | select_type | table                      | type  | possible_keys | key     | key_len | ref  | rows   | Extra                                        |
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | table                      | range | page_id       | page_id | 12      | NULL | 520024 | Using where; Using temporary; Using filesort |
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+

EDIT: At the suggestion of ypercube I tried adding a new index (page_id, user_id, date_created). However mysql does not use it bu default so i had to suggest it to the query optimizer. Here is the new query and the explain:

SELECT SQL_NO_CACHE user_id, count(*) nr FROM `table` USE INDEX (usridexp) WHERE `page_id`=301 and `date_created` BETWEEN '2012-01-03' AND '2012-02-03 23:59:59' AND page_id<>user_id group by `user_id` ORDER BY NULL


    +----+-------------+----------------------------+------+---------------+----------+---------+-------+---------+--------------------------+
    | id | select_type | table                      | type | possible_keys | key      | key_len | ref   | rows    | Extra                    |
    +----+-------------+----------------------------+------+---------------+----------+---------+-------+---------+--------------------------+
    |  1 | SIMPLE      | table                      | ref  | usridexp      | usridexp | 4       | const | 3943444 | Using where; Using index |
    +----+-------------+----------------------------+------+---------------+----------+---------+-------+---------+--------------------------+
like image 956
Overdeath Avatar asked Feb 07 '12 11:02

Overdeath


1 Answers

Some changes that may improve the query:

  • Change COUNT(id) to COUNT(*). Since id is (I guess) the PRIMARY KEY and NOT NULL, the results will be identical.

  • Add an ORDER BY NULL after ther GROUP BY clause. In MySQL, a group by operation also sorts the results, unless you specify other wise.

  • The (page_id, date_created) is probably the best index that MySQL can use for this query but you could also try (page_id, user_id, date_created) (can you also post the EXPLAIN if you add this index?)


Another thing not related to the performance of this query:

If your (user_id, page_id, date_created) is UNIQUE and the id is auto generated (and not used for anything else but as a Primary Key), you can make it the PRIMARY KEY and drop the id column. One less index and 4 bytes less per row.

like image 130
ypercubeᵀᴹ Avatar answered Oct 19 '22 12:10

ypercubeᵀᴹ