Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need help understanding how mysql indexes work

I have a table that looks like this:

CREATE TABLE `metric` (
  `metricid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `host` varchar(50) NOT NULL,
  `userid` int(10) unsigned DEFAULT NULL,
  `lastmetricvalue` double DEFAULT NULL,
  `receivedat` int(10) unsigned DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  `sampleid` tinyint(3) unsigned NOT NULL,
  `type` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `lastrawvalue` double NOT NULL,
  `priority` tinyint(3) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`metricid`),
  UNIQUE KEY `unique-metric` (`userid`,`host`,`name`,`sampleid`)
) ENGINE=InnoDB AUTO_INCREMENT=1000000221496 DEFAULT CHARSET=utf8

It has 177,892 rows at the moment, and when I run the following query:

select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE m.userid = 8
  AND (host, name, sampleid) IN (('localhost','0.4350799184758216cpu-3/cpu-nice',0),
  ('localhost','0.4350799184758216cpu-3/cpu-system',0),
  ('localhost','0.4350799184758216cpu-3/cpu-idle',0),
  ('localhost','0.4350799184758216cpu-3/cpu-wait',0),
  ('localhost','0.4350799184758216cpu-3/cpu-interrupt',0),
  ('localhost','0.4350799184758216cpu-3/cpu-softirq',0),
  ('localhost','0.4350799184758216cpu-3/cpu-steal',0),
  ('localhost','0.4350799184758216cpu-4/cpu-user',0),
  ('localhost','0.4350799184758216cpu-4/cpu-nice',0),
  ('localhost','0.4350799184758216cpu-4/cpu-system',0),
  ('localhost','0.4350799184758216cpu-4/cpu-idle',0),
  ('localhost','0.4350799184758216cpu-4/cpu-wait',0),
  ('localhost','0.4350799184758216cpu-4/cpu-interrupt',0),
  ('localhost','0.4350799184758216cpu-4/cpu-softirq',0),
  ('localhost','0.4350799184758216cpu-4/cpu-steal',0),
  ('localhost','_util/billing-bytes',0),('localhost','_util/billing-metrics',0));

it takes 0.87 seconds to return results, explain is:

    *************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: m
         type: ref
possible_keys: unique-metric
          key: unique-metric
      key_len: 5
          ref: const
         rows: 85560
        Extra: Using where
1 row in set (0.00 sec)

profile looks like this:

+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000160 |
| checking permissions           | 0.000010 |
| Opening tables                 | 0.000021 |
| exit open_tables()             | 0.000008 |
| System lock                    | 0.000008 |
| mysql_lock_tables(): unlocking | 0.000005 |
| exit mysqld_lock_tables()      | 0.000007 |
| init                           | 0.000068 |
| optimizing                     | 0.000018 |
| statistics                     | 0.000091 |
| preparing                      | 0.000042 |
| executing                      | 0.000005 |
| Sending data                   | 0.870180 |
| innobase_commit_low():trx_comm | 0.000012 |
| Sending data                   | 0.000111 |
| end                            | 0.000009 |
| query end                      | 0.000009 |
| ha_commit_one_phase(-1)        | 0.000015 |
| innobase_commit_low():trx_comm | 0.000004 |
| ha_commit_one_phase(-1)        | 0.000005 |
| query end                      | 0.000005 |
| closing tables                 | 0.000012 |
| freeing items                  | 0.000562 |
| logging slow query             | 0.000005 |
| cleaning up                    | 0.000005 |
| sleeping                       | 0.000006 |
+--------------------------------+----------+

Which seems way too high for me. I've tried to replace the userid = 8 and (host, name, sampleid) IN part of the first query to (userid, host, name, sampleid) IN and this query runs about 0.5s - almost 2 times quicker, for reference, here's the query:

select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE (userid, host, name, sampleid) IN ((8,'localhost','0.4350799184758216cpu-3/cpu-nice',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-system',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-idle',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-wait',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-interrupt',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-softirq',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-steal',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-user',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-nice',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-system',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-idle',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-wait',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-interrupt',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-softirq',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-steal',0),
  (8,'localhost','_util/billing-bytes',0),
  (8,'localhost','_util/billing-metrics',0));

its explain looks like this:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: m
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 171121
        Extra: Using where
1 row in set (0.00 sec)

Next I've updated the table to contain a single joined column:

alter table `metric` add `forindex` varchar(120) not null default '';
update metric set forindex = concat(userid,`host`,`name`,sampleid);
alter table metric add index `forindex` (`forindex`);

Updated the query to have only 1 string searched:

select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE (forindex) IN (('8localhost0.4350799184758216cpu-3/cpu-nice0'),
  ('8localhost0.4350799184758216cpu-3/cpu-system0'),
  ('8localhost0.4350799184758216cpu-3/cpu-idle0'),
  ('8localhost0.4350799184758216cpu-3/cpu-wait0'),
  ('8localhost0.4350799184758216cpu-3/cpu-interrupt0'),
  ('8localhost0.4350799184758216cpu-3/cpu-softirq0'),
  ('8localhost0.4350799184758216cpu-3/cpu-steal0'),
  ('8localhost0.4350799184758216cpu-4/cpu-user0'),
  ('8localhost0.4350799184758216cpu-4/cpu-nice0'),
  ('8localhost0.4350799184758216cpu-4/cpu-system0'),
  ('8localhost0.4350799184758216cpu-4/cpu-idle0'),
  ('8localhost0.4350799184758216cpu-4/cpu-wait0'),
  ('8localhost0.4350799184758216cpu-4/cpu-interrupt0'),
  ('8localhost0.4350799184758216cpu-4/cpu-softirq0'),
  ('8localhost0.4350799184758216cpu-4/cpu-steal0'),
  ('8localhost_util/billing-bytes0'),
  ('8localhost_util/billing-metrics0'));

And now I get the same results in 0.00 sec! Explain is:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: m
         type: range
possible_keys: forindex
          key: forindex
      key_len: 362
          ref: NULL
         rows: 17
        Extra: Using where
1 row in set (0.00 sec)

So to summarize, here are the results:

  1. m.userid = X AND (host, name, sampleid) IN - index used, 85560 rows scanned, runs in 0.9s
  2. (userid, host, name, sampleid) IN - index not used, 171121 rows scanned, runs in 0.5s
  3. additional column with compound index replaced with an index over a concatenated utility column - index used, 17 rows scanned, runs in 0s

Why does second query run faster than the first? And why is the third query so much faster than the rest? Should I keep such a column for the sole purpose of faster searching?

Mysql version is: mysqld Ver 5.5.34-55 for Linux on x86_64 (Percona XtraDB Cluster (GPL), wsrep_25.9.r3928)

like image 751
Fluffy Avatar asked Apr 10 '14 17:04

Fluffy


People also ask

How does indexes work in MySQL?

Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs.

How do I know if indexing is working in MySQL?

To see the index for a specific table use SHOW INDEX: SHOW INDEX FROM yourtable; To see indexes for all tables within a specific schema you can use the STATISTICS table from INFORMATION_SCHEMA: SELECT DISTINCT TABLE_NAME, INDEX_NAME FROM INFORMATION_SCHEMA.

What is indexes in MySQL with example?

Practically, indexes are also a type of tables, which keep primary key or index field and a pointer to each record into the actual table. The users cannot see the indexes, they are just used to speed up queries and will be used by the Database Search Engine to locate records very fast.


1 Answers

Indexes help your search terms in the WHERE clause by narrowing down the search as much as possible. You can see this happening...

The rows field of EXPLAIN gives an estimate of how many rows the query will have to examine to find the rows that match your query. By comparing the rows reported in each EXPLAIN, you can see how much better your better-optimized query is:

     rows: 85560 -- first query

     rows: 171121 -- second query examines 2x more rows, but it was probably 
                  -- faster because the data was buffered after the first query

     rows: 17 -- third query examines 5,000x fewer rows than first query

You would also notice in the SHOW PROFILE details if you ran that for the third query that "Sending data" is a lot faster for the quicker query. This process state indicates how long it took to copy rows from the storage engine up to the SQL layer of MySQL. Even when doing memory-to-memory copying, this takes a while for so many thousands of rows. This is why indexes are so beneficial.

For more useful explanation, see my presentation How to Design Indexes, Really.

like image 197
Bill Karwin Avatar answered Sep 18 '22 23:09

Bill Karwin