I am attempting to speed up a query that takes around 60 seconds to complete on a table of ~20 million rows.
For this example, the table has three columns (id, dateAdded, name). id is the primary key. The indexes I have added to the table are:
(dateAdded)
(name)
(id, name)
(id, name, dateAdded)
The query I am trying to run is:
SELECT MAX(id) as id, name
FROM exampletable
WHERE dateAdded <= '2014-01-20 12:00:00'
GROUP BY name
ORDER BY NULL;
The date is variable from query to query.
The objective of this is to get the most recent entry for each name at or before the date added.
When I use explain on the query it tells me that it is using the (id, name, dateAdded) index.
+----+-------------+------------------+-------+------------------+----------------------------------------------+---------+------+----------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+------------------+----------------------------------------------+---------+------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | exampletable | index | date_added_index | id_element_name_date_added_index | 162 | NULL | 22016957 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+------------------+-------+------------------+----------------------------------------------+---------+------+----------+-----------------------------------------------------------+
Edit: Added two new indexes from comments:
(dateAdded, name, id)
(name, id)
+----+-------------+------------------+-------+---------------------------------------------------------------+----------------------------------------------+---------+------+----------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------------------------------------------------------+----------------------------------------------+---------+------+----------+-------------------------------------------+
| 1 | SIMPLE | exampletable | index | date_added_index,date_added_name_id_index | id__name_date_added_index | 162 | NULL | 22040469 | Using where; Using index; Using temporary |
+----+-------------+------------------+-------+---------------------------------------------------------------+----------------------------------------------+---------+------+----------+-------------------------------------------+
Edit: Added create table script.
CREATE TABLE `exampletable` (
`id` int(10) NOT NULL auto_increment,
`dateAdded` timestamp NULL default CURRENT_TIMESTAMP,
`name` varchar(50) character set utf8 default '',
PRIMARY KEY (`id`),
KEY `date_added_index` (`dateAdded`),
KEY `name_index` USING BTREE (`name`),
KEY `id_name_index` USING BTREE (`id`,`name`),
KEY `id_name_date_added_index` USING BTREE (`id`,`dateAdded`,`name`),
KEY `date_added_name_id_index` USING BTREE (`dateAdded`,`name`,`id`),
KEY `name_id_index` USING BTREE (`name`,`id`)
) ENGINE=MyISAM AUTO_INCREMENT=22046064 DEFAULT CHARSET=latin1
Edit: Here is the Explain from the answer provided by HeavyE.
+----+-------------+--------------+-------+------------------------------------------------------------------------------------------+--------------------------+---------+--------------------------------------------------+------+---------------------------------------+
| id | select_type | table | type | possible_k | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+------------------------------------------------------------------------------------------+--------------------------+---------+--------------------------------------------------+------+---------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1732 | Using temporary; Using filesort |
| 1 | PRIMARY | example1 | ref | date_added_index,name_index,date_added_name_id_index,name_id_index,name_date_added_index | date_added_name_id_index | 158 | maxDateByElement.dateAdded,maxDateByElement.name | 1 | Using where; Using index |
| 2 | DERIVED | exampletable | range | date_added_index,date_added_name_id_index | name_date_added_index | 158 | NULL | 1743 | Using where; Using index for group-by |
+----+-------------+--------------+-------+------------------------------------------------------------------------------------------+--------------------------+---------+--------------------------------------------------+------+---------------------------------------+
There is a great Stack Overflow post on optimization of Selecting rows with the max value in a column: https://stackoverflow.com/a/7745635/633063
This seems a little messy but works very well:
SELECT example1.name, MAX(example1.id)
FROM exampletable example1
INNER JOIN (
select name, max(dateAdded) dateAdded
from exampletable
where dateAdded <= '2014-01-20 12:00:00'
group by name
) maxDateByElement on example1.name = maxDateByElement.name AND example1.dateAdded = maxDateByElement.dateAdded
GROUP BY name;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With