In the last past days I noticed something weird optimizing my query. I have a simple query which does something like:
SELECT id,name,amount FROM reservations WHERE NOT canceled ORDER BY name ASC
I noticed mysql wasn't using any index, so I started doing some experiments. Accidentally I replaced the "NOT canceled" with "canceled=false", and then, Mysql started using "canceled" as index. After that I tried using the opposite:
SELECT ... FROM reservations WHERE canceled ORDER BY ...
Same result! When I change that to "canceled=true" the index works again.
My question is: HOW COME?! Isn't using "NOT" the "elegant" way? Anyhow I didn't expect for it to make any difference.
I'm using InnoDB as the engine, but i get same result using MyISAM. Can someone clarify things up? Thanks.
Edit: Table structure
CREATE TABLE `reservations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`trip_code` varchar(10) DEFAULT NULL,
`departure_date` date DEFAULT NULL,
`amount` float DEFAULT NULL,
`name` varchar(45) DEFAULT NULL,
`canceled` tinyint(1) NOT NULL DEFAULT '0',
`created_date` date NOT NULL,
`creator_user` int(11) NOT NULL DEFAULT '1',
`last_update_user` int(11) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
KEY `trip_code` (`trip_code`),
KEY `departure_date` (`departure_date`),
KEY `created_date` (`created_date`),
KEY `canceled` (`canceled`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=123181 ;
Even though it's using an index, the index (believe it or not) may make your query slower. It's a little weird, but it's related to index selectivity. It's generally presented in columns of type boolean.
It's descrbed like:
"How different values of a field are. It is a number from 0-1, although you can also think of it as a percentage. A value of 1, or 100%, means that each value in the field is unique"
It's important to consider becouse:
"MySQL has a cost-based optimizer. This means that MySQL calculates the costs of different ways of performing a query and then chooses the cheapest one. Well, calculating the costs is an inexact science. So an estimate is taken, and the estimate is wrong sometimes."
Plain simple:
If the data you're looking has more or less 20% of the same value (for example, cancelled has 40% of your table) then, it's simple to just do a table scan.
EDIT:
Regarding your question, EXPLAIN tells you that MySQL is using an index. But, it might not be good, the only way to note whether your optimization is better is to test performance. Also, consider the costo of INSERT, UPDATE and DELETE operations to keep that index. Do some profiling with and without the index.
Take a look at this:
I am not familiar with MYSQL, but thinking logically, I understand it like this:
Index is like a phone book, when you are searching for "Cohen", you can get it right away.
But if you are looking for NOT "Cohen", you will have to run over every entry, and check if it's different from "Cohen".
So when you are looking for specific value, it looks just for it. And when you are using NOT, it looks for any other value that can fit inside tinyint(1)
(as I understand it's not only 1
or 0
, is it?).
SELECT *
FROM
(SELECT 1 AS C, 0 AS X UNION ALL
SELECT 2 AS C, 1 AS X UNION ALL
SELECT 3 AS C, 2 AS X ) T
WHERE X=true
Returns
'2', '1'
And
SELECT *
FROM
(SELECT 1 AS C, 0 AS X UNION ALL
SELECT 2 AS C, 1 AS X UNION ALL
SELECT 3 AS C, 2 AS X ) T
WHERE X
Returns
'2', '1'
'3', '2'
So it seems that in the first case the true
gets cast to int
and then used in a seekable predicate whereas in the second case the column value is implicitly cast. Implicit casts generally make a condition unsargable.
Looking at the explain plan for your query with WHERE canceled = true
gives
+----+-------------+--------------+------+---------------+----------+---------+-------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+----------+---------+-------+------+-----------------------------+
| 1 | SIMPLE | reservations | ref | canceled | canceled | 1 | const | 1 | Using where; Using filesort |
+----+-------------+--------------+------+---------------+----------+---------+-------+------+-----------------------------+
Whereas for WHERE canceled
you get
+----+-------------+--------------+------+---------------+-----+---------+-----+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+-----+---------+-----+------+-----------------------------+
| 1 | SIMPLE | reservations | ALL | | | | | 2 | Using where; Using filesort |
+----+-------------+--------------+------+---------------+-----+---------+-----+------+-----------------------------+
So it appears that it can't even consider the index on canceled
as a possible option in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With