I have a table that looks like this:
CREATE TABLE `metric` (
`metricid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`host` varchar(50) NOT NULL,
`userid` int(10) unsigned DEFAULT NULL,
`lastmetricvalue` double DEFAULT NULL,
`receivedat` int(10) unsigned DEFAULT NULL,
`name` varchar(255) NOT NULL,
`sampleid` tinyint(3) unsigned NOT NULL,
`type` tinyint(3) unsigned NOT NULL DEFAULT '0',
`lastrawvalue` double NOT NULL,
`priority` tinyint(3) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`metricid`),
UNIQUE KEY `unique-metric` (`userid`,`host`,`name`,`sampleid`)
) ENGINE=InnoDB AUTO_INCREMENT=1000000221496 DEFAULT CHARSET=utf8
It has 177,892 rows at the moment, and when I run the following query:
select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE m.userid = 8
AND (host, name, sampleid) IN (('localhost','0.4350799184758216cpu-3/cpu-nice',0),
('localhost','0.4350799184758216cpu-3/cpu-system',0),
('localhost','0.4350799184758216cpu-3/cpu-idle',0),
('localhost','0.4350799184758216cpu-3/cpu-wait',0),
('localhost','0.4350799184758216cpu-3/cpu-interrupt',0),
('localhost','0.4350799184758216cpu-3/cpu-softirq',0),
('localhost','0.4350799184758216cpu-3/cpu-steal',0),
('localhost','0.4350799184758216cpu-4/cpu-user',0),
('localhost','0.4350799184758216cpu-4/cpu-nice',0),
('localhost','0.4350799184758216cpu-4/cpu-system',0),
('localhost','0.4350799184758216cpu-4/cpu-idle',0),
('localhost','0.4350799184758216cpu-4/cpu-wait',0),
('localhost','0.4350799184758216cpu-4/cpu-interrupt',0),
('localhost','0.4350799184758216cpu-4/cpu-softirq',0),
('localhost','0.4350799184758216cpu-4/cpu-steal',0),
('localhost','_util/billing-bytes',0),('localhost','_util/billing-metrics',0));
it takes 0.87 seconds to return results, explain is:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: m
type: ref
possible_keys: unique-metric
key: unique-metric
key_len: 5
ref: const
rows: 85560
Extra: Using where
1 row in set (0.00 sec)
profile looks like this:
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000160 |
| checking permissions | 0.000010 |
| Opening tables | 0.000021 |
| exit open_tables() | 0.000008 |
| System lock | 0.000008 |
| mysql_lock_tables(): unlocking | 0.000005 |
| exit mysqld_lock_tables() | 0.000007 |
| init | 0.000068 |
| optimizing | 0.000018 |
| statistics | 0.000091 |
| preparing | 0.000042 |
| executing | 0.000005 |
| Sending data | 0.870180 |
| innobase_commit_low():trx_comm | 0.000012 |
| Sending data | 0.000111 |
| end | 0.000009 |
| query end | 0.000009 |
| ha_commit_one_phase(-1) | 0.000015 |
| innobase_commit_low():trx_comm | 0.000004 |
| ha_commit_one_phase(-1) | 0.000005 |
| query end | 0.000005 |
| closing tables | 0.000012 |
| freeing items | 0.000562 |
| logging slow query | 0.000005 |
| cleaning up | 0.000005 |
| sleeping | 0.000006 |
+--------------------------------+----------+
Which seems way too high for me. I've tried to replace the userid = 8 and (host, name, sampleid) IN
part of the first query to (userid, host, name, sampleid) IN
and this query runs about 0.5s - almost 2 times quicker, for reference, here's the query:
select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE (userid, host, name, sampleid) IN ((8,'localhost','0.4350799184758216cpu-3/cpu-nice',0),
(8,'localhost','0.4350799184758216cpu-3/cpu-system',0),
(8,'localhost','0.4350799184758216cpu-3/cpu-idle',0),
(8,'localhost','0.4350799184758216cpu-3/cpu-wait',0),
(8,'localhost','0.4350799184758216cpu-3/cpu-interrupt',0),
(8,'localhost','0.4350799184758216cpu-3/cpu-softirq',0),
(8,'localhost','0.4350799184758216cpu-3/cpu-steal',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-user',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-nice',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-system',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-idle',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-wait',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-interrupt',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-softirq',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-steal',0),
(8,'localhost','_util/billing-bytes',0),
(8,'localhost','_util/billing-metrics',0));
its explain looks like this:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: m
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 171121
Extra: Using where
1 row in set (0.00 sec)
Next I've updated the table to contain a single joined column:
alter table `metric` add `forindex` varchar(120) not null default '';
update metric set forindex = concat(userid,`host`,`name`,sampleid);
alter table metric add index `forindex` (`forindex`);
Updated the query to have only 1 string searched:
select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE (forindex) IN (('8localhost0.4350799184758216cpu-3/cpu-nice0'),
('8localhost0.4350799184758216cpu-3/cpu-system0'),
('8localhost0.4350799184758216cpu-3/cpu-idle0'),
('8localhost0.4350799184758216cpu-3/cpu-wait0'),
('8localhost0.4350799184758216cpu-3/cpu-interrupt0'),
('8localhost0.4350799184758216cpu-3/cpu-softirq0'),
('8localhost0.4350799184758216cpu-3/cpu-steal0'),
('8localhost0.4350799184758216cpu-4/cpu-user0'),
('8localhost0.4350799184758216cpu-4/cpu-nice0'),
('8localhost0.4350799184758216cpu-4/cpu-system0'),
('8localhost0.4350799184758216cpu-4/cpu-idle0'),
('8localhost0.4350799184758216cpu-4/cpu-wait0'),
('8localhost0.4350799184758216cpu-4/cpu-interrupt0'),
('8localhost0.4350799184758216cpu-4/cpu-softirq0'),
('8localhost0.4350799184758216cpu-4/cpu-steal0'),
('8localhost_util/billing-bytes0'),
('8localhost_util/billing-metrics0'));
And now I get the same results in 0.00 sec! Explain is:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: m
type: range
possible_keys: forindex
key: forindex
key_len: 362
ref: NULL
rows: 17
Extra: Using where
1 row in set (0.00 sec)
So to summarize, here are the results:
m.userid = X AND (host, name, sampleid) IN
- index used, 85560 rows scanned, runs in 0.9s(userid, host, name, sampleid) IN
- index not used, 171121 rows scanned, runs in 0.5sWhy does second query run faster than the first? And why is the third query so much faster than the rest? Should I keep such a column for the sole purpose of faster searching?
Mysql version is:
mysqld Ver 5.5.34-55 for Linux on x86_64 (Percona XtraDB Cluster (GPL), wsrep_25.9.r3928)
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs.
To see the index for a specific table use SHOW INDEX: SHOW INDEX FROM yourtable; To see indexes for all tables within a specific schema you can use the STATISTICS table from INFORMATION_SCHEMA: SELECT DISTINCT TABLE_NAME, INDEX_NAME FROM INFORMATION_SCHEMA.
Practically, indexes are also a type of tables, which keep primary key or index field and a pointer to each record into the actual table. The users cannot see the indexes, they are just used to speed up queries and will be used by the Database Search Engine to locate records very fast.
Indexes help your search terms in the WHERE
clause by narrowing down the search as much as possible. You can see this happening...
The rows
field of EXPLAIN gives an estimate of how many rows the query will have to examine to find the rows that match your query. By comparing the rows
reported in each EXPLAIN, you can see how much better your better-optimized query is:
rows: 85560 -- first query
rows: 171121 -- second query examines 2x more rows, but it was probably
-- faster because the data was buffered after the first query
rows: 17 -- third query examines 5,000x fewer rows than first query
You would also notice in the SHOW PROFILE details if you ran that for the third query that "Sending data" is a lot faster for the quicker query. This process state indicates how long it took to copy rows from the storage engine up to the SQL layer of MySQL. Even when doing memory-to-memory copying, this takes a while for so many thousands of rows. This is why indexes are so beneficial.
For more useful explanation, see my presentation How to Design Indexes, Really.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With