Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does removing this index in MySQL speed up my query 100x?

Tags:

indexing

mysql

I have the following MySQL table (simplified):

CREATE TABLE `track` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `title` varchar(256) NOT NULL,
  `is_active` tinyint(1) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `is_active` (`is_active`, `id`)
) ENGINE=MyISAM AUTO_INCREMENT=7495088 DEFAULT CHARSET=utf8

The 'is_active' column marks rows that I want to ignore in most, but not all, of my queries. I have some queries that read chunks out of this table periodically. One of them looks like this:

SELECT id,title from track where (track.is_active=1 and track.id > 5580702) ORDER BY id ASC LIMIT 10;

This query takes over a minute to execute. Here's the execution plan:

> EXPLAIN SELECT id,title from track where (track.is_active=1 and track.id > 5580702) ORDER BY id ASC LIMIT 10;
+----+-------------+-------+------+----------------+--------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys  | key    | key_len | ref   | rows    | Extra       |
+----+-------------+-------+------+----------------+--------+---------+-------+---------+-------------+
|  1 | SIMPLE      | t     | ref  | PRIMARY,is_active | is_active | 1       | const | 3747543 | Using where |
+----+-------------+-------+------+----------------+--------+---------+-------+---------+-------------+

Now, if I tell MySQL to ignore the 'is_active' index, the query happens instantaneously.

> EXPLAIN SELECT id,title from track IGNORE INDEX(is_active) WHERE (track.is_active=1 AND track.id > 5580702) ORDER BY id ASC LIMIT 10;
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows    | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
|  1 | SIMPLE      | t     | range | PRIMARY       | PRIMARY | 4       | NULL | 1597518 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+

Now, what's really strange is that if I FORCE MySQL to use the 'is_active' index, the query once again happens instantaneously!

+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows    | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
|  1 | SIMPLE      | t     | range | is_active     |is_active| 5       | NULL | 1866730 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+

I just don't understand this behavior. In the 'is_active' index, rows should be sorted by is_active, followed by id. I use both the 'is_active' and 'id' columns in my query, so it seems like it should only need to do a few hops around the tree to find the IDs, then use those IDs to retrieve the titles from the table.

What's going on?

EDIT: More info on what I'm doing:

  • Query cache is disabled
  • Running OPTIMIZE TABLE and ANALYZE TABLE had no effect
  • 6,620,372 rows have 'is_active' set to True. 874,714 rows have 'is_active' set to False.
  • Using FORCE INDEX(is_active) once again speeds up the query.
  • MySQL version 5.1.54
like image 360
cwick Avatar asked Sep 29 '11 01:09

cwick


People also ask

How do indexes speed up queries?

Indexing makes columns faster to query by creating pointers to where data is stored within a database. Imagine you want to find a piece of information that is within a large database. To get this information out of the database the computer will look through every row until it finds it.

How to use indexing in MySQL query?

You can create a simple index on a table. Just omit the UNIQUE keyword from the query to create a simple index. A Simple index allows duplicate values in a table. If you want to index the values in a column in a descending order, you can add the reserved word DESC after the column name.

What is index in MySQL What is advantage of index?

Advantages of MySQL Indexes 1- Indexes make search queries much faster. 2- Indexes like primary key index and unique index help to avoid duplicate row data. 3- Full-text indexes in MySQL, users have the opportunity to optimize searching against even large amounts of text located in any field indexed as such.


2 Answers

It looks like MySQL is making a poor decision about how to use the index.

From that query plan, it is showing it could have used either the PRIMARY or is_active index, and it has chosen is_active in order to narrow by track.is_active first. However, it is only using the first column of the index (track.is_active). That gets it 3747543 results which then have to be filtered and sorted.

If it had chosen the PRIMARY index, it would be able to narrow down to 1597518 rows using the index, and they would be retrieved in order of track.id already, which should require no further sorting. That would be faster.

New information:

In the third case where you are using FORCE INDEX, MySQL is using the is_active index but now instead of only using the first column, it is using both columns (see key_len). It is therefore now able to narrow by is_active and sort and filter by id using the same index, and since is_active is a single constant, the ORDER BY is satisfied by the second column (ie the rows from a single branch of the index are already in sorted order). This seems to be an even better outcome than using PRIMARY - and probably what you intended in the first place, right?

I don't know why it wasn't using both columns of this index without FORCE INDEX, unless the query has changed in a subtle way in between. If not I'd put it down to MySQL making bad decisions.

like image 137
thomasrutter Avatar answered Oct 24 '22 13:10

thomasrutter


I think the speedup is due to your where clause. I am assuming that it is only retrieving a small subset of the rows in the entire large table. It is faster to do a table scan of the retrieved data for is_active on the small subset than to do the filtering through a large index file. Traversing a single column index is much faster than traversing a combined index.

like image 26
Rasika Avatar answered Oct 24 '22 13:10

Rasika