Table has 1 500 000 records, 1 250 000 of them have field = 'z'.
I need select random not 'z' field.
$random = mt_rand(1, 250000);
$query = "SELECT field FROM table WHERE field != 'z' LIMIT $random, 1";
It is working ok.
Then I decided to optimize it and indexed field
in table.
Result was strange - it was slower ~3 times. I tested it.
Why it is slower? Is not such indexing should make it faster?
my ISAM
explain with index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table range field field 758 NULL 1139287 Using
explain without index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table ALL NULL NULL NULL NULL 1484672 Using where
Why? Because sequentially scanning an index is actually less efficient than sequentially scanning a table, for indexes with a large number of rows for a given key.
As shown, indexes can speed up some queries and slow down others.
Indexing makes columns faster to query by creating pointers to where data is stored within a database. Imagine you want to find a piece of information that is within a large database. To get this information out of the database the computer will look through every row until it finds it.
Go to Control Panel | Indexing Options to monitor the indexing. The DisableBackOff = 1 option makes the indexing go faster than the default value. You can continue to work on the computer but indexing will continue in the background and is less likely to pause when other programs are running.
Summary
The problem is that field
is not a good candidate for indexing, due to the nature of b-trees.
Explanation
Let's suppose you have a table that has the results of 500,000 coin tosses, where the toss is either 1
(heads) or 0
(tails):
CREATE TABLE toss (
id int NOT NULL AUTO_INCREMENT,
result int NOT NULL DEFAULT '0',
PRIMARY KEY ( id )
)
select result, count(*) from toss group by result order by result;
+--------+----------+
| result | count(*) |
+--------+----------+
| 0 | 250290 |
| 1 | 249710 |
+--------+----------+
2 rows in set (0.40 sec)
If you want to select one toss (at random) where the toss was tails, then you need to search through your table, picking a random starting place.
select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id | result |
+--------+--------+
| 246700 | 0 |
+--------+--------+
1 row in set (0.06 sec)
explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | toss | ALL | NULL | NULL | NULL | NULL | 500000 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
You see that you're basically searching sequentially through all of the rows to find a match.
If you create an index on the toss
field, then your index will contain two values, each with roughly 250,000 entries.
create index foo on toss ( result );
Query OK, 500000 rows affected (2.48 sec)
Records: 500000 Duplicates: 0 Warnings: 0
select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id | result |
+--------+--------+
| 246700 | 0 |
+--------+--------+
1 row in set (0.25 sec)
explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | toss | range | foo | foo | 4 | NULL | 154565 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
Now you're searching fewer records, but the time to search increased from 0.06 to 0.25 seconds. Why? Because sequentially scanning an index is actually less efficient than sequentially scanning a table, for indexes with a large number of rows for a given key.
Let's look at the indexes on this table:
show index from toss;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| toss | 0 | PRIMARY | 1 | id | A | 500000 | NULL | NULL | | BTREE | |
| toss | 1 | foo | 1 | result | A | 2 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
The PRIMARY index is a good index: there are 500,000 rows, and there are 500,000 values. Arranged in a BTREE, you can quickly identify a single row based on the id.
The foo index is a bad index: there are 500,000 rows, but only 2 possible values. This is pretty much the worst possible case for a BTREE -- all of the overhead of searching the index, and still having to search through the results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With