I'm using NOT IN inside my SQL query.
For example:
select columnA
from table1
where columnA not in (
select columnB
from table2)
How is it possible that this part of the query
select columnB
from table2
takes 30sec to complete, but the whole query above takes 0.1sec to complete?? Shouldn't the complete query take 30sec + ?
BTW, both queries return valid results.
Thanks!
Answers to Comments
Is it because the second query hasn't actually completed but has only returned back the first 'x' rows (out of a very large table?)
No, the query is completed after 30 seconds, not to many rows returned (eg. 50).
But @Aleksandar wondered why the question congaing the performance killer was so fast.
my point exactly
Also how long does select distinct columnB from table2 take to execute?
actually, the original query is "select distinct...
Queries can become slow for various reasons ranging from improper index usage to bugs in the storage engine itself. However, in most cases, queries become slow because developers or MySQL database administrators neglect to monitor them and keep an eye on their performance.
When using NOT IN , the subquery returns a list of zero or more values in the outer query where the comparison column does not match any of the values returned from the subquery. Using the previous example, NOT IN returns all the products that are not supplied from MA.
It seems you are thinking that your main query implies the following steps:
(1) Run the subquery
(2) Check each row in table1 against the result set from the subquery.
Therefore, you think that running the subquery separately must take less time than running the whole query.
But SQL is not a procedural language, and the structure of the query does not necessarily imply the steps that will be followed to execute the query.
As Guffa answered, the optimizer will come up with (what it believes is) an optimal plan to execute each query. These execution plans are not always obvious from looking at the query, and in some cases can indeed be quite counter-intuitive.
I think that it is most likely, in this case, that the optimizer has come up with a quicker method for checking whether a value exists in table2 than simply querying all of table2 at once. It could be the transformation Guffa showed (although that still does not tell you the exact execution plan being used).
I would guess that table1 has significantly fewer rows than table2, and an index exists on table2.columnB. So all it has to do is fetch the rows from table1, then probe the index for each of those value to check for existence. But this is only one possibility.
Also, as Michael Buen pointed out, differences in the size of the result set returned can also impact your perceived performance. My intuition is that this is secondary to the execution plan differences, but it can be significant.
It's because the query optimiser turns the query to something that looks completely different. The actual query should be the same as what's produced by a query like this:
select columnA
from table1
left join table2 on ColumnA = ColumnB
where ColumnB is null
If the database can use indexes to join the tables, perhaps it doesn't have to query the entire table2, or even touch the table itself.
A dramatic comparison, let's say this...
select columnB
from table2
...has billion rows (30 seconds), many data travels the wire and presented to user.
And this...
select columnA
from table1
...has only one row.
RDBMS won't do a dumb job of pulling the data of table2 from server to client if you don't intend to display the data of table2. So no network bandwidth or I/O will be involved much when doing data presence testing, it all happens at the server, the only thing that would be pulled from server to client is the one row of table1 only.
select columnA
from table1
where columnA not in (
select columnB
from table2)
And things shall be especially fast if your columnA and columnB happen to have an index
Things that would make a database operation slow are twofold: first is when you pull too much data from the server to client, second is when you don't have an index on pertinent fields
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With