Below two queries are subqueries. Both are the same and both works fine for me. But the problem is Method 1 query takes about 10 secs to execute while Method 2 query takes under 1 sec.
I was able to convert method 1 query to method 2 but I don't understand what's happening in the query. I have been trying to figure it out myself. I would really like to learn what's the difference between below two queries and how does the performance gain happen ? what's the logic behind it ?
I'm new to these advance techniques. I hope someone will help me out here. Given that I read the docs which does not give me a clue.
Method 1 :
SELECT * FROM tracker WHERE reservation_id IN ( SELECT reservation_id FROM tracker GROUP BY reservation_id HAVING ( method = 1 AND type = 0 AND Count(*) > 1 ) OR ( method = 1 AND type = 1 AND Count(*) > 1 ) OR ( method = 2 AND type = 2 AND Count(*) > 0 ) OR ( method = 3 AND type = 0 AND Count(*) > 0 ) OR ( method = 3 AND type = 1 AND Count(*) > 1 ) OR ( method = 3 AND type = 3 AND Count(*) > 0 ) )
Method 2 :
SELECT * FROM `tracker` t WHERE EXISTS ( SELECT reservation_id FROM `tracker` t3 WHERE t3.reservation_id = t.reservation_id GROUP BY reservation_id HAVING ( METHOD = 1 AND TYPE = 0 AND COUNT(*) > 1 ) OR ( METHOD = 1 AND TYPE = 1 AND COUNT(*) > 1 ) OR ( METHOD = 2 AND TYPE = 2 AND COUNT(*) > 0 ) OR ( METHOD = 3 AND TYPE = 0 AND COUNT(*) > 0 ) OR ( METHOD = 3 AND TYPE = 1 AND COUNT(*) > 1 ) OR ( METHOD = 3 AND TYPE = 3 AND COUNT(*) > 0 ) )
Based on rule optimizer: EXISTS is much faster than IN , when the sub-query results is very large. IN is faster than EXISTS , when the sub-query results is very small.
The EXISTS clause is much faster than IN when the subquery results is very large. Conversely, the IN clause is faster than EXISTS when the subquery results is very small. Also, the IN clause can't compare anything with NULL values, but the EXISTS clause can compare everything with NULLs.
It is a good practice to avoid multiple levels of nested subqueries, since they are not easily readable and do not have good performance. In general, it is better to write a query with JOIN s rather than with subqueries if possible, especially if the subqueries are correlated.
Advantages Of Joins: The retrieval time of the query using joins almost always will be faster than that of a subquery. By using joins, you can maximize the calculation burden on the database i.e., instead of multiple queries using one join query.
An Explain Plan
would have shown you why exactly you should use Exists
. Usually the question comes Exists vs Count(*)
. Exists
is faster. Why?
With regard to challenges present by NULL: when subquery returns Null
, for IN the entire query becomes Null
. So you need to handle that as well. But using Exist
, it's merely a false
. Much easier to cope. Simply IN
can't compare anything with Null
but Exists
can.
e.g. Exists (Select * from yourtable where bla = 'blabla');
you get true/false the moment one hit is found/matched.
In this case IN
sort of takes the position of the Count(*)
to select ALL matching rows based on the WHERE
because it's comparing all values.
But don't forget this either:
EXISTS
executes at high speed against IN
: when the subquery results is very large.IN
gets ahead of EXISTS
: when the subquery results is very small.Reference to for more details:
Method 2 is fast because it is using EXISTS
operator, where I MySQL
do not load any results. As mentioned in your docs link as well, that it omits whatever is there in SELECT
clause. It only checks for the first value that matches the criteria, once found it sets the condition TRUE
and moves for further processing.
On the other side Method 1 has IN
operator which loads all possible values and then matches it. Condition is set TRUE
only when exact match is found which is time consuming process.
Hence your method 2 is fast.
Hope it helps...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With