The table structure is:
CREATE TABLE `test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`from` int(10) unsigned NOT NULL,
`to` int(10) unsigned NOT NULL,
`message` text NOT NULL,
`sent` int(10) unsigned NOT NULL DEFAULT '0',
`read` tinyint(1) unsigned NOT NULL DEFAULT '0',
`direction` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `one` (`to`,`direction`,`from`,`id`),
KEY `two` (`from`,`direction`,`to`,`id`),
KEY `three` (`read`,`direction`,`to`),
KEY `four` (`read`,`direction`,`from`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
I have a strange issue. Please have a look at the following query:
select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test
where (
(test.to = 244975 and test.direction <> 2 and test.direction <> 3 and
(
(test.from = 204177 and test.id > 5341203) OR
(test.from = 214518 and test.id > 5336549) OR
(test.from = 231429 and test.id > 5338284) OR
(test.from = 242739 and test.id > 5339541) OR
(test.from = 243834 and test.id > 5340438) OR
(test.from = 244354 and test.id > 5337489) OR
(test.from = 244644 and test.id > 5338572) OR
(test.from = 244690 and test.id > 5338467)
)
)
or
(test.from = 244975 and test.direction <> 1 and test.direction <> 3 and
(
(test.to = 204177 and test.id > 5341203) OR
(test.to = 214518 and test.id > 5336549) OR
(test.to = 231429 and test.id > 5338284) OR
(test.to = 242739 and test.id > 5339541) OR
(test.to = 243834 and test.id > 5340438) OR
(test.to = 244354 and test.id > 5337489) OR
(test.to = 244644 and test.id > 5338572) OR
(test.to = 244690 and test.id > 5338467)
)
)
or
(test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975 and test.from not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
)
or
(test.read <> 1 and test.direction = 2 and test.from = 244975 and test.to not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
)
)
order by test.id;
If I do an explain on this query, it goes through all the rows:
1 SIMPLE test index PRIMARY,one,two,three,four PRIMARY 4 1440596 Using where
If I remove both the "not in" statements, then it works fine:
select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test
where (
(test.to = 244975 and test.direction <> 2 and test.direction <> 3 and
(
(test.from = 204177 and test.id > 5341203) OR
(test.from = 214518 and test.id > 5336549) OR
(test.from = 231429 and test.id > 5338284) OR
(test.from = 242739 and test.id > 5339541) OR
(test.from = 243834 and test.id > 5340438) OR
(test.from = 244354 and test.id > 5337489) OR
(test.from = 244644 and test.id > 5338572) OR
(test.from = 244690 and test.id > 5338467)
)
)
or
(test.from = 244975 and test.direction <> 1 and test.direction <> 3 and
(
(test.to = 204177 and test.id > 5341203) OR
(test.to = 214518 and test.id > 5336549) OR
(test.to = 231429 and test.id > 5338284) OR
(test.to = 242739 and test.id > 5339541) OR
(test.to = 243834 and test.id > 5340438) OR
(test.to = 244354 and test.id > 5337489) OR
(test.to = 244644 and test.id > 5338572) OR
(test.to = 244690 and test.id > 5338467)
)
)
or
(test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975
)
or
(test.read <> 1 and test.direction = 2 and test.from = 244975
)
)
order by test.id;
Now the explain query returns:
1 SIMPLE test index_merge PRIMARY,one,two,three,four one,two 5,5 30 Using sort_union(one,two); Using where; Using filesort
I am not sure why it does not work right. What am I missing in the indexes?
I am not sure why it does not work right. What am I missing in the indexes?
I'm pretty sure the query planner is working perfectly, you're not missing anything in the indexes that would help in this case. The query planner decided that it would be faster to use a different index because the two queries are very different.
We can make the optimizer use a union of the indexes for us which will make it considerably faster. You can keep the not in
and not change any of the or
statements. I ran some basic benchmarks of the method I used against the union method. Caveats apply because your DB configuration may be vastly different than mine. Running the query a 1000 times and doing this 3 times I took the best time for each query...
Optimized Query shown below
real 0m15.410s
user 0m6.681s
sys 0m2.641s
Re-Written as a set of unions
real 0m17.747s
user 0m6.798s
sys 0m2.812s
Think like the optimizer and work with less data
The following SQL is several orders of magnitude faster in tests on a ~4 million row database. The key change is the following line
(select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test
This one line is vastly reducing the dataset that mysql needs to work on because we're using in
instead of not in
. This is the new query, I've tried not to change the original query too much.
select SQL_NO_CACHE test.id, test.from_, test.to_, test.message, test.sent, test.read_, test.direction
from (select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test
where (
(test.to_ = 244975 and test.direction <> 2 and test.direction <> 3 and test.from_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and
(
(test.from_ = 204177 and test.id > 5341203) OR
(test.from_ = 214518 and test.id > 5336549) OR
(test.from_ = 231429 and test.id > 5338284) OR
(test.from_ = 242739 and test.id > 5339541) OR
(test.from_ = 243834 and test.id > 5340438) OR
(test.from_ = 244354 and test.id > 5337489) OR
(test.from_ = 244644 and test.id > 5338572) OR
(test.from_ = 244690 and test.id > 5338467)
)
)
or
(test.from_ = 244975 and test.direction <> 1 and test.direction <> 3 and test.to_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and
(
(test.to_ = 204177 and test.id > 5341203) OR
(test.to_ = 214518 and test.id > 5336549) OR
(test.to_ = 231429 and test.id > 5338284) OR
(test.to_ = 242739 and test.id > 5339541) OR
(test.to_ = 243834 and test.id > 5340438) OR
(test.to_ = 244354 and test.id > 5337489) OR
(test.to_ = 244644 and test.id > 5338572) OR
(test.to_ = 244690 and test.id > 5338467)
))
or
(test.read_ <> 1 and test.direction <> 2 and test.direction <> 3 and test.to_ = 244975 and test.from_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690))
or
(test.read_ <> 1 and test.direction = 2 and test.from_ = 244975 and test.to_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690))
)
order by test.id;
The explain plan for this looks very different...
mysql> \. sql_fixed.sql
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 226
filtered: 100.00
Extra: Using where; Using filesort
*************************** 2. row ***************************
id: 2
select_type: DERIVED
table: test
type: index_merge
possible_keys: one,two
key: two,one
key_len: 4,4
ref: NULL
rows: 226
filtered: 100.00
Extra: Using sort_union(two,one); Using where
2 rows in set, 1 warning (0.01 sec)
The optimizer being smart immediately can see that it doesn't need most of the data because up front we've told it to use an IN
statement with a few keys. Most query optimizers attach a high cost to disk accesses so anything that reduces this it typically preferred by the optimizer.
NOT IN vs IN
not in
and in
are very different. The difference between these in this case is access pattern, do I need the data temporarily or as part of the result set. When you use not in
with a few keys and the index holds millions of keys it might need to read lots of records if the data is part of the result set. Even when using indexes not in
can be reading millions of records from disk... in
with a few keys and these are the keys you need is asking to find and use a small subset. The two access patterns are very different. The following example might help make this clear...
1. I don't want these 10 items from a 1,000,000 records I need the other 999,990, this reads the whole index.
2. I only want these 10 from a 1,000,000 records. This might only require one disk seek.
Number 2 is faster because the of the access pattern ie I found the 10 I need, Nunmber 1. may need to read a million records.
MySQL's query optimizer is seeing this ie the last two OR statements are asking for large subsets of data from either the table or the index ie case 1. above. Seeing that and the fact that it needed to use the primary key anyway the optimizer decided it would be quicker to use the primary key.
When you remove the not in
things change ie now the query planner can use the indexes because in the other two or
clauses they're in effect get me the few from the many
and it performs an index_merge on the two keys that share a to
and a from
column along with the id
.
To see what I mean don't remove the 'not in' part of the query, change it to in
to see what happens, on my machine the query plan changed to use a range index.
If your mySQL version less then 5.0.7, the mysql issue can be reason
Have a look this ticket in MySQL bug tracking https://bugs.mysql.com/bug.php?id=10561
Mixing AND
and OR
would often cause weird query plan on MySQL, in my experience. I don't have enough data on the table to test, but I would try rewriting your query using UNION ALL
. After all, OR
in WHERE
is basically UNION
.
The idea is to break it down on smaller conditions so MySQL can use different index optimized for each part as opposed to jamming all together.
SELECT * FROM (
SELECT
test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
test
WHERE
test.to = 244975
AND test.direction <> 2
AND test.direction <> 3
AND (
(test.from = 204177 AND test.id > 5341203) OR
(test.from = 214518 AND test.id > 5336549) OR
(test.from = 231429 AND test.id > 5338284) OR
(test.from = 242739 AND test.id > 5339541) OR
(test.from = 243834 AND test.id > 5340438) OR
(test.from = 244354 AND test.id > 5337489) OR
(test.from = 244644 AND test.id > 5338572) OR
(test.from = 244690 AND test.id > 5338467)
)
UNION ALL
SELECT
test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
test
WHERE
test.from = 244975
AND test.direction <> 1
AND test.direction <> 3
AND (
(test.to = 204177 and test.id > 5341203) OR
(test.to = 214518 and test.id > 5336549) OR
(test.to = 231429 and test.id > 5338284) OR
(test.to = 242739 and test.id > 5339541) OR
(test.to = 243834 and test.id > 5340438) OR
(test.to = 244354 and test.id > 5337489) OR
(test.to = 244644 and test.id > 5338572) OR
(test.to = 244690 and test.id > 5338467)
)
UNION ALL
SELECT
test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
test
WHERE
test.read <> 1
AND test.direction <> 3
AND test.direction <> 2
AND test.to = 244975
AND test.from NOT IN (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
UNION ALL
SELECT
test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
test
WHERE
test.read <> 1
AND test.direction = 2
AND test.from = 244975
AND test.to NOT IN (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
) test ORDER BY test.id
It would be good to have a dump of sample data to test with, but I created some of my own anyway. Next I split each of the four outer OR conditions into sub-queries, UNIONed them, and moved the ordering to the final result set.
I've had issues with indexes when using complex WHERE clauses and to me it looks like you have a chat/messaging app and are attempting to get messages to and from a particular user in a single query. Personally, I'd split these into individual queries to simplify the code/queries.
Here is my query:
SELECT test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM (
SELECT *
FROM test
WHERE test.to = 244975
AND test.direction not in (2,3)
AND (
(test.from = 204177 AND test.id > 5341203)
OR (test.from = 214518 AND test.id > 5336549)
OR (test.from = 231429 AND test.id > 5338284)
OR (test.from = 242739 AND test.id > 5339541)
OR (test.from = 243834 AND test.id > 5340438)
OR (test.from = 244354 AND test.id > 5337489)
OR (test.from = 244644 AND test.id > 5338572)
OR (test.from = 244690 AND test.id > 5338467)
)
UNION
SELECT *
FROM test
WHERE test.from = 244975
AND test.direction not in (1,3)
AND (
(test.to = 204177 AND test.id > 5341203)
OR (test.to = 214518 AND test.id > 5336549)
OR (test.to = 231429 AND test.id > 5338284)
OR (test.to = 242739 AND test.id > 5339541)
OR (test.to = 243834 AND test.id > 5340438)
OR (test.to = 244354 AND test.id > 5337489)
OR (test.to = 244644 AND test.id > 5338572)
OR (test.to = 244690 AND test.id > 5338467)
)
UNION
SELECT *
FROM test
WHERE test.read != 1
AND test.direction not in (2,3)
AND test.to = 244975
AND test.from not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
UNION
SELECT *
FROM test
WHERE test.read != 1
AND test.direction = 2
AND test.from = 244975
AND test.to not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
) test
ORDER BY test.id;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With