Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mysql not using indexes when "not in" statement is present

Tags:

mysql

The table structure is:

CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `from` int(10) unsigned NOT NULL,
  `to` int(10) unsigned NOT NULL,
  `message` text NOT NULL,
  `sent` int(10) unsigned NOT NULL DEFAULT '0',
  `read` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `direction` tinyint(1) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `one` (`to`,`direction`,`from`,`id`),
  KEY `two` (`from`,`direction`,`to`,`id`),
  KEY `three` (`read`,`direction`,`to`),
  KEY `four` (`read`,`direction`,`from`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

I have a strange issue. Please have a look at the following query:

select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test 
where (

    (test.to = 244975 and test.direction <> 2 and test.direction <> 3 and 
        (
        (test.from = 204177 and test.id > 5341203) OR 
        (test.from = 214518 and test.id > 5336549) OR
        (test.from = 231429 and test.id > 5338284) OR
        (test.from = 242739 and test.id > 5339541) OR
        (test.from = 243834 and test.id > 5340438) OR
        (test.from = 244354 and test.id > 5337489) OR
        (test.from = 244644 and test.id > 5338572) OR
        (test.from = 244690 and test.id > 5338467) 
        )

    )

    or 

    (test.from = 244975 and test.direction <> 1 and test.direction <> 3 and 
        (
        (test.to = 204177 and test.id > 5341203) OR
        (test.to = 214518 and test.id > 5336549) OR
        (test.to = 231429 and test.id > 5338284) OR
        (test.to = 242739 and test.id > 5339541) OR
        (test.to = 243834 and test.id > 5340438) OR
        (test.to = 244354 and test.id > 5337489) OR
        (test.to = 244644 and test.id > 5338572) OR
        (test.to = 244690 and test.id > 5338467)
        )
    )

    or 

    (test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975  and test.from not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)

    )

    or

    (test.read <> 1 and test.direction = 2 and test.from = 244975 and test.to not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)

    )


     )



     order by test.id;

If I do an explain on this query, it goes through all the rows:

1   SIMPLE  test    index   PRIMARY,one,two,three,four  PRIMARY 4       1440596 Using where

If I remove both the "not in" statements, then it works fine:

select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test 
where (

    (test.to = 244975 and test.direction <> 2 and test.direction <> 3 and 
        (
        (test.from = 204177 and test.id > 5341203) OR 
        (test.from = 214518 and test.id > 5336549) OR
        (test.from = 231429 and test.id > 5338284) OR
        (test.from = 242739 and test.id > 5339541) OR
        (test.from = 243834 and test.id > 5340438) OR
        (test.from = 244354 and test.id > 5337489) OR
        (test.from = 244644 and test.id > 5338572) OR
        (test.from = 244690 and test.id > 5338467) 
        )

    )

    or 

    (test.from = 244975 and test.direction <> 1 and test.direction <> 3 and 
        (
        (test.to = 204177 and test.id > 5341203) OR
        (test.to = 214518 and test.id > 5336549) OR
        (test.to = 231429 and test.id > 5338284) OR
        (test.to = 242739 and test.id > 5339541) OR
        (test.to = 243834 and test.id > 5340438) OR
        (test.to = 244354 and test.id > 5337489) OR
        (test.to = 244644 and test.id > 5338572) OR
        (test.to = 244690 and test.id > 5338467)
        )
    )

    or 

    (test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975 

    )

    or

    (test.read <> 1 and test.direction = 2 and test.from = 244975 

    )


     )



     order by test.id;

Now the explain query returns:

1   SIMPLE  test    index_merge PRIMARY,one,two,three,four  one,two 5,5     30  Using sort_union(one,two); Using where; Using filesort

I am not sure why it does not work right. What am I missing in the indexes?

like image 639
Alec Smart Avatar asked Apr 01 '16 09:04

Alec Smart


4 Answers

I am not sure why it does not work right. What am I missing in the indexes?

I'm pretty sure the query planner is working perfectly, you're not missing anything in the indexes that would help in this case. The query planner decided that it would be faster to use a different index because the two queries are very different.

We can make the optimizer use a union of the indexes for us which will make it considerably faster. You can keep the not in and not change any of the or statements. I ran some basic benchmarks of the method I used against the union method. Caveats apply because your DB configuration may be vastly different than mine. Running the query a 1000 times and doing this 3 times I took the best time for each query...

Optimized Query shown below

real    0m15.410s
user    0m6.681s
sys 0m2.641s

Re-Written as a set of unions

real    0m17.747s
user    0m6.798s
sys 0m2.812s

Think like the optimizer and work with less data

The following SQL is several orders of magnitude faster in tests on a ~4 million row database. The key change is the following line

(select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test 

This one line is vastly reducing the dataset that mysql needs to work on because we're using in instead of not in. This is the new query, I've tried not to change the original query too much.

select SQL_NO_CACHE test.id, test.from_, test.to_, test.message, test.sent, test.read_, test.direction 
from (select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test 
where (
  (test.to_ = 244975 and test.direction <> 2 and test.direction <> 3 and test.from_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and 
        (   
        (test.from_ = 204177 and test.id > 5341203) OR  
        (test.from_ = 214518 and test.id > 5336549) OR
        (test.from_ = 231429 and test.id > 5338284) OR
        (test.from_ = 242739 and test.id > 5339541) OR
        (test.from_ = 243834 and test.id > 5340438) OR
        (test.from_ = 244354 and test.id > 5337489) OR
        (test.from_ = 244644 and test.id > 5338572) OR
        (test.from_ = 244690 and test.id > 5338467) 
        )   
    )   
    or  
    (test.from_ = 244975 and test.direction <> 1 and test.direction <> 3 and test.to_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and 
        (   
        (test.to_ = 204177 and test.id > 5341203) OR
        (test.to_ = 214518 and test.id > 5336549) OR
        (test.to_ = 231429 and test.id > 5338284) OR
        (test.to_ = 242739 and test.id > 5339541) OR
        (test.to_ = 243834 and test.id > 5340438) OR
        (test.to_ = 244354 and test.id > 5337489) OR
        (test.to_ = 244644 and test.id > 5338572) OR
        (test.to_ = 244690 and test.id > 5338467)
        ))  
    or  
    (test.read_ <> 1 and test.direction <> 2 and test.direction <> 3 and test.to_ = 244975  and test.from_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690))
    or  
    (test.read_ <> 1 and test.direction = 2 and test.from_ = 244975 and test.to_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690))
     )   
     order by test.id;

The explain plan for this looks very different...

mysql> \. sql_fixed.sql
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 226
     filtered: 100.00
        Extra: Using where; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: test
         type: index_merge
possible_keys: one,two
          key: two,one
      key_len: 4,4
          ref: NULL
         rows: 226
     filtered: 100.00
        Extra: Using sort_union(two,one); Using where
2 rows in set, 1 warning (0.01 sec)

The optimizer being smart immediately can see that it doesn't need most of the data because up front we've told it to use an IN statement with a few keys. Most query optimizers attach a high cost to disk accesses so anything that reduces this it typically preferred by the optimizer.

NOT IN vs IN

not in and in are very different. The difference between these in this case is access pattern, do I need the data temporarily or as part of the result set. When you use not in with a few keys and the index holds millions of keys it might need to read lots of records if the data is part of the result set. Even when using indexes not in can be reading millions of records from disk... in with a few keys and these are the keys you need is asking to find and use a small subset. The two access patterns are very different. The following example might help make this clear...

1. I don't want these 10 items from a 1,000,000 records I need the other 999,990, this reads the whole index.
2. I only want these 10 from a 1,000,000 records. This might only require one disk seek.

Number 2 is faster because the of the access pattern ie I found the 10 I need, Nunmber 1. may need to read a million records.

MySQL's query optimizer is seeing this ie the last two OR statements are asking for large subsets of data from either the table or the index ie case 1. above. Seeing that and the fact that it needed to use the primary key anyway the optimizer decided it would be quicker to use the primary key.

When you remove the not in things change ie now the query planner can use the indexes because in the other two or clauses they're in effect get me the few from the many and it performs an index_merge on the two keys that share a to and a from column along with the id.

To see what I mean don't remove the 'not in' part of the query, change it to in to see what happens, on my machine the query plan changed to use a range index.

like image 162
Harry Avatar answered Nov 12 '22 13:11

Harry


If your mySQL version less then 5.0.7, the mysql issue can be reason

Have a look this ticket in MySQL bug tracking https://bugs.mysql.com/bug.php?id=10561

like image 25
Pavel Zimogorov Avatar answered Nov 12 '22 12:11

Pavel Zimogorov


Mixing AND and OR would often cause weird query plan on MySQL, in my experience. I don't have enough data on the table to test, but I would try rewriting your query using UNION ALL. After all, OR in WHERE is basically UNION.

The idea is to break it down on smaller conditions so MySQL can use different index optimized for each part as opposed to jamming all together.

SELECT * FROM (
SELECT
    test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
    test 
WHERE
    test.to = 244975
    AND test.direction <> 2
    AND test.direction <> 3
    AND (
        (test.from = 204177 AND test.id > 5341203) OR 
        (test.from = 214518 AND test.id > 5336549) OR
        (test.from = 231429 AND test.id > 5338284) OR
        (test.from = 242739 AND test.id > 5339541) OR
        (test.from = 243834 AND test.id > 5340438) OR
        (test.from = 244354 AND test.id > 5337489) OR
        (test.from = 244644 AND test.id > 5338572) OR
        (test.from = 244690 AND test.id > 5338467) 
    )
UNION ALL
SELECT
    test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
    test 
WHERE
    test.from = 244975
    AND test.direction <> 1
    AND test.direction <> 3
    AND (
        (test.to = 204177 and test.id > 5341203) OR
        (test.to = 214518 and test.id > 5336549) OR
        (test.to = 231429 and test.id > 5338284) OR
        (test.to = 242739 and test.id > 5339541) OR
        (test.to = 243834 and test.id > 5340438) OR
        (test.to = 244354 and test.id > 5337489) OR
        (test.to = 244644 and test.id > 5338572) OR
        (test.to = 244690 and test.id > 5338467)
    )
UNION ALL
SELECT
    test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
    test 
WHERE
    test.read <> 1
    AND test.direction <> 3
    AND test.direction <> 2
    AND test.to = 244975
    AND test.from NOT IN (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
UNION ALL
SELECT
    test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
    test 
WHERE
    test.read <> 1
    AND test.direction = 2
    AND test.from = 244975
    AND test.to NOT IN (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
) test ORDER BY test.id
like image 4
maresa Avatar answered Nov 12 '22 11:11

maresa


It would be good to have a dump of sample data to test with, but I created some of my own anyway. Next I split each of the four outer OR conditions into sub-queries, UNIONed them, and moved the ordering to the final result set.

I've had issues with indexes when using complex WHERE clauses and to me it looks like you have a chat/messaging app and are attempting to get messages to and from a particular user in a single query. Personally, I'd split these into individual queries to simplify the code/queries.

Here is my query:

SELECT test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM (
  SELECT *
  FROM test
  WHERE test.to = 244975
    AND test.direction not in (2,3)
    AND (
      (test.from = 204177 AND test.id > 5341203)
      OR (test.from = 214518 AND test.id > 5336549)
      OR (test.from = 231429 AND test.id > 5338284)
      OR (test.from = 242739 AND test.id > 5339541)
      OR (test.from = 243834 AND test.id > 5340438)
      OR (test.from = 244354 AND test.id > 5337489)
      OR (test.from = 244644 AND test.id > 5338572)
      OR (test.from = 244690 AND test.id > 5338467)
    )
  UNION
  SELECT *
  FROM test
  WHERE test.from = 244975
    AND test.direction not in (1,3)
    AND (
      (test.to = 204177 AND test.id > 5341203)
      OR (test.to = 214518 AND test.id > 5336549)
      OR (test.to = 231429 AND test.id > 5338284)
      OR (test.to = 242739 AND test.id > 5339541)
      OR (test.to = 243834 AND test.id > 5340438)
      OR (test.to = 244354 AND test.id > 5337489)
      OR (test.to = 244644 AND test.id > 5338572)
      OR (test.to = 244690 AND test.id > 5338467)
    )
  UNION
  SELECT *
  FROM test
  WHERE test.read != 1
    AND test.direction not in (2,3)
    AND test.to = 244975
    AND test.from not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
  UNION
  SELECT *
  FROM test
  WHERE test.read != 1
    AND test.direction = 2
    AND test.from = 244975
    AND test.to not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
) test
ORDER BY test.id;
like image 4
Brendan Smith Avatar answered Nov 12 '22 12:11

Brendan Smith