Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why index made this query slower?

Tags:

indexing

mysql

Table has 1 500 000 records, 1 250 000 of them have field = 'z'.
I need select random not 'z' field.

$random = mt_rand(1, 250000);  
$query = "SELECT field FROM table WHERE field != 'z' LIMIT $random, 1";

It is working ok.

Then I decided to optimize it and indexed field in table.

Result was strange - it was slower ~3 times. I tested it.
Why it is slower? Is not such indexing should make it faster?

my ISAM

explain with index:  
id  select_type  table  type  possible_keys  key   key_len  ref  rows     Extra  
1   SIMPLE       table  range field          field 758      NULL 1139287  Using  

explain without index:  
id  select_type  table  type  possible_keys  key  key_len  ref  rows     Extra  
1   SIMPLE       table  ALL   NULL           NULL NULL     NULL 1484672  Using where
like image 420
Qiao Avatar asked Jul 30 '10 22:07

Qiao


People also ask

Why does index query slow down?

Why? Because sequentially scanning an index is actually less efficient than sequentially scanning a table, for indexes with a large number of rows for a given key.

Can an index make a query slower?

As shown, indexes can speed up some queries and slow down others.

Why does index make query faster?

Indexing makes columns faster to query by creating pointers to where data is stored within a database. Imagine you want to find a piece of information that is within a large database. To get this information out of the database the computer will look through every row until it finds it.

How can I make my index faster?

Go to Control Panel | Indexing Options to monitor the indexing. The DisableBackOff = 1 option makes the indexing go faster than the default value. You can continue to work on the computer but indexing will continue in the background and is less likely to pause when other programs are running.


1 Answers

Summary

The problem is that field is not a good candidate for indexing, due to the nature of b-trees.

Explanation

Let's suppose you have a table that has the results of 500,000 coin tosses, where the toss is either 1 (heads) or 0 (tails):

CREATE TABLE toss (
    id int NOT NULL AUTO_INCREMENT,
    result int NOT NULL DEFAULT '0',
    PRIMARY KEY ( id )
)

select result, count(*) from toss group by result order by result;
+--------+----------+
| result | count(*) |
+--------+----------+
|      0 |   250290 |
|      1 |   249710 |
+--------+----------+
2 rows in set (0.40 sec)

If you want to select one toss (at random) where the toss was tails, then you need to search through your table, picking a random starting place.

select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id     | result |
+--------+--------+
| 246700 |      0 |
+--------+--------+
1 row in set (0.06 sec)

explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | toss  | ALL  | NULL          | NULL | NULL    | NULL | 500000 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+

You see that you're basically searching sequentially through all of the rows to find a match.

If you create an index on the toss field, then your index will contain two values, each with roughly 250,000 entries.

create index foo on toss ( result );
Query OK, 500000 rows affected (2.48 sec)
Records: 500000  Duplicates: 0  Warnings: 0

select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id     | result |
+--------+--------+
| 246700 |      0 |
+--------+--------+
1 row in set (0.25 sec)

explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | toss  | range | foo           | foo  | 4       | NULL | 154565 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+

Now you're searching fewer records, but the time to search increased from 0.06 to 0.25 seconds. Why? Because sequentially scanning an index is actually less efficient than sequentially scanning a table, for indexes with a large number of rows for a given key.

Let's look at the indexes on this table:

show index from toss;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| toss  |          0 | PRIMARY  |            1 | id          | A         |      500000 |     NULL | NULL   |      | BTREE      |         |
| toss  |          1 | foo      |            1 | result      | A         |           2 |     NULL | NULL   |      | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

The PRIMARY index is a good index: there are 500,000 rows, and there are 500,000 values. Arranged in a BTREE, you can quickly identify a single row based on the id.

The foo index is a bad index: there are 500,000 rows, but only 2 possible values. This is pretty much the worst possible case for a BTREE -- all of the overhead of searching the index, and still having to search through the results.

like image 141
Craig Trader Avatar answered Oct 09 '22 09:10

Craig Trader