I was trying to optimize NOT IN clause in mysql: Some how I ended up in the following query:
SELECT @i:=(SELECT correct_option_word_id FROM sent_question WHERE msisdn='abc');
SELECT * FROM word WHERE @i IS NULL OR word_id NOT IN (@i);
There is no relationship between sent_question table and word table. And also I cannot place index on correct_option_word_id.
Can somebody please explain, will this method even optimize the query or not?
UPDATE: As mentioned here that both the methods: NOT IN and LEFT JOIN/IS NULL are almost equally efficient. That's why I don't want to use LEFT JOIN/IS NULL method.
UPDATE 2: Explain results for original query:
EXPLAIN SELECT * FROM word WHERE word_id NOT IN (SELECT correct_option_word_id FROM sent_question WHERE msisdn='abc');
+----+--------------------+---------------+------+-------------------------+-------------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------+------+-------------------------+-------------------------+---------+-------+------+-------------+
| 1 | PRIMARY | word | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 2 | DEPENDENT SUBQUERY | sent_question | ref | fk_question_subscriber1 | fk_question_subscriber1 | 48 | const | 1 | Using where |
+----+--------------------+---------------+------+-------------------------+-------------------------+---------+-------+------+-------------+
You're right in that both the NOT IN and LEFT JOIN/IS NULL method are equally efficient, however, unfortunately, there is no faster option, only slower ones (NOT EXISTS).
Here's your query, simplified:
SELECT *
FROM word
WHERE
word_id NOT IN (SELECT correct_option_word_id FROM sent_question WHERE msisdn='abc')
As you know, MySQL will do the subquery first and use the returned result set for the NOT IN clause. Then, it will scan through all of the rows in word to see if word_id is in the list for each row.
Unfortunately for this case, indexes are inclusive, not exclusive. They don't help with NOT queries. A covering index on word could potentially still be used to avoid accessing the actual table, and provide some IO benefits, but it won't be used in the traditional "lookup" sense. However, since you are returning all columns on the word table, it may not be viable to have such a large index.
The most important index that will be used here is an index on sent_question.msisdn for the subquery. Ensure that you have that index defined. A multi-column "covering" index on (msisdn, correct_option_word_id) would be best.
If you share your design, we can probably offer some design solutions for optimization.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With