Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MYSQL - NOT vs var=false

Tags:

sql

mysql

In the last past days I noticed something weird optimizing my query. I have a simple query which does something like:

   SELECT id,name,amount FROM reservations WHERE NOT canceled ORDER BY name ASC

I noticed mysql wasn't using any index, so I started doing some experiments. Accidentally I replaced the "NOT canceled" with "canceled=false", and then, Mysql started using "canceled" as index. After that I tried using the opposite:

   SELECT ... FROM reservations WHERE canceled ORDER BY ...

Same result! When I change that to "canceled=true" the index works again.

My question is: HOW COME?! Isn't using "NOT" the "elegant" way? Anyhow I didn't expect for it to make any difference.

I'm using InnoDB as the engine, but i get same result using MyISAM. Can someone clarify things up? Thanks.

Edit: Table structure

CREATE TABLE `reservations` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `trip_code` varchar(10) DEFAULT NULL,
  `departure_date` date DEFAULT NULL,
  `amount` float DEFAULT NULL,
  `name` varchar(45) DEFAULT NULL,
  `canceled` tinyint(1) NOT NULL DEFAULT '0',
  `created_date` date NOT NULL,
  `creator_user` int(11) NOT NULL DEFAULT '1',
  `last_update_user` int(11) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`),
  KEY `trip_code` (`trip_code`),
  KEY `departure_date` (`departure_date`),
  KEY `created_date` (`created_date`),
  KEY `canceled` (`canceled`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=123181 ;
like image 563
Phoenix Avatar asked Nov 07 '11 10:11

Phoenix


3 Answers

Even though it's using an index, the index (believe it or not) may make your query slower. It's a little weird, but it's related to index selectivity. It's generally presented in columns of type boolean.

It's descrbed like:

"How different values of a field are. It is a number from 0-1, although you can also think of it as a percentage. A value of 1, or 100%, means that each value in the field is unique"

It's important to consider becouse:

"MySQL has a cost-based optimizer. This means that MySQL calculates the costs of different ways of performing a query and then chooses the cheapest one. Well, calculating the costs is an inexact science. So an estimate is taken, and the estimate is wrong sometimes."

Plain simple:

If the data you're looking has more or less 20% of the same value (for example, cancelled has 40% of your table) then, it's simple to just do a table scan.

EDIT:

Regarding your question, EXPLAIN tells you that MySQL is using an index. But, it might not be good, the only way to note whether your optimization is better is to test performance. Also, consider the costo of INSERT, UPDATE and DELETE operations to keep that index. Do some profiling with and without the index.

Take a look at this:

  • http://sheeri.com/archives/77
  • http://www.mysqlperformanceblog.com/2006/06/02/indexes-in-mysql/
like image 119
santiagobasulto Avatar answered Oct 27 '22 19:10

santiagobasulto


I am not familiar with MYSQL, but thinking logically, I understand it like this:
Index is like a phone book, when you are searching for "Cohen", you can get it right away.
But if you are looking for NOT "Cohen", you will have to run over every entry, and check if it's different from "Cohen".
So when you are looking for specific value, it looks just for it. And when you are using NOT, it looks for any other value that can fit inside tinyint(1) (as I understand it's not only 1 or 0, is it?).

like image 43
Oleg Grishko Avatar answered Oct 27 '22 21:10

Oleg Grishko


SELECT *
FROM 
(SELECT 1 AS C, 0 AS X UNION ALL
SELECT 2 AS C, 1 AS X UNION ALL
SELECT 3 AS C, 2 AS X ) T
WHERE X=true

Returns

'2', '1'

And

SELECT *
FROM 
(SELECT 1 AS C, 0 AS X UNION ALL
SELECT 2 AS C, 1 AS X UNION ALL
SELECT 3 AS C, 2 AS X ) T
WHERE X

Returns

'2', '1'
'3', '2'

So it seems that in the first case the true gets cast to int and then used in a seekable predicate whereas in the second case the column value is implicitly cast. Implicit casts generally make a condition unsargable.

Looking at the explain plan for your query with WHERE canceled = true gives

+----+-------------+--------------+------+---------------+----------+---------+-------+------+-----------------------------+
| id | select_type |    table     | type | possible_keys |   key    | key_len |  ref  | rows |            Extra            |
+----+-------------+--------------+------+---------------+----------+---------+-------+------+-----------------------------+
|  1 | SIMPLE      | reservations | ref  | canceled      | canceled |       1 | const |    1 | Using where; Using filesort |
+----+-------------+--------------+------+---------------+----------+---------+-------+------+-----------------------------+

Whereas for WHERE canceled you get

+----+-------------+--------------+------+---------------+-----+---------+-----+------+-----------------------------+
| id | select_type |    table     | type | possible_keys | key | key_len | ref | rows |            Extra            |
+----+-------------+--------------+------+---------------+-----+---------+-----+------+-----------------------------+
|  1 | SIMPLE      | reservations | ALL  |               |     |         |     |    2 | Using where; Using filesort |
+----+-------------+--------------+------+---------------+-----+---------+-----+------+-----------------------------+

So it appears that it can't even consider the index on canceled as a possible option in this case.

like image 38
Martin Smith Avatar answered Oct 27 '22 19:10

Martin Smith