I'm using NOT IN inside my SQL query. For example: <pre class="prettyprint"><code>select columnA from table1 where columnA not in ( select columnB from table2) </code></pre> How is it possible that this part of the query <pre class="prettyprint"><code>select columnB from table2 </code></pre> takes 30sec to complete, but the whole query above takes 0.1sec to complete?? Shouldn't the complete query take 30sec + ? BTW, both queries return valid results. Thanks! Answers to Comments <blockquote> Is it because the second query hasn't actually completed but has only returned back the first 'x' rows (out of a very large table?) </blockquote> No, the query is completed after 30 seconds, not to many rows returned (eg. 50). <blockquote> But @Aleksandar wondered why the question congaing the performance killer was so fast. </blockquote> my point exactly <blockquote> Also how long does select distinct columnB from table2 take to execute? </blockquote> actually, the original query is "select distinct...

It's because the query optimiser turns the query to something that looks completely different. The actual query should be the same as what's produced by a query like this: <pre class="prettyprint"><code>select columnA from table1 left join table2 on ColumnA = ColumnB where ColumnB is null </code></pre> If the database can use indexes to join the tables, perhaps it doesn't have to query the entire table2, or even touch the table itself.

SQL - query inside NOT IN takes longer than the complete query?

Tags:

sql

oracle

I'm using NOT IN inside my SQL query.

For example:

select columnA 
from table1
where columnA not in (
select columnB
from table2)

How is it possible that this part of the query

select columnB
from table2

takes 30sec to complete, but the whole query above takes 0.1sec to complete?? Shouldn't the complete query take 30sec + ?

BTW, both queries return valid results.

Thanks!

Answers to Comments

Is it because the second query hasn't actually completed but has only returned back the first 'x' rows (out of a very large table?)

No, the query is completed after 30 seconds, not to many rows returned (eg. 50).

But @Aleksandar wondered why the question congaing the performance killer was so fast.

my point exactly

Also how long does select distinct columnB from table2 take to execute?

actually, the original query is "select distinct...

694

asked Jan 05 '11 13:01

Alex

3 Answers

It seems you are thinking that your main query implies the following steps:

(1)  Run the subquery
(2)  Check each row in table1 against the result set from the subquery.

Therefore, you think that running the subquery separately must take less time than running the whole query.

But SQL is not a procedural language, and the structure of the query does not necessarily imply the steps that will be followed to execute the query.

As Guffa answered, the optimizer will come up with (what it believes is) an optimal plan to execute each query. These execution plans are not always obvious from looking at the query, and in some cases can indeed be quite counter-intuitive.

I think that it is most likely, in this case, that the optimizer has come up with a quicker method for checking whether a value exists in table2 than simply querying all of table2 at once. It could be the transformation Guffa showed (although that still does not tell you the exact execution plan being used).

I would guess that table1 has significantly fewer rows than table2, and an index exists on table2.columnB. So all it has to do is fetch the rows from table1, then probe the index for each of those value to check for existence. But this is only one possibility.

Also, as Michael Buen pointed out, differences in the size of the result set returned can also impact your perceived performance. My intuition is that this is secondary to the execution plan differences, but it can be significant.

104

answered Nov 15 '22 06:11

Dave Costa

It's because the query optimiser turns the query to something that looks completely different. The actual query should be the same as what's produced by a query like this:

select columnA 
from table1
left join table2 on ColumnA = ColumnB
where ColumnB is null

If the database can use indexes to join the tables, perhaps it doesn't have to query the entire table2, or even touch the table itself.

answered Nov 15 '22 05:11

Guffa

A dramatic comparison, let's say this...

select columnB
from table2

...has billion rows (30 seconds), many data travels the wire and presented to user.

And this...

select columnA 
from table1

...has only one row.

RDBMS won't do a dumb job of pulling the data of table2 from server to client if you don't intend to display the data of table2. So no network bandwidth or I/O will be involved much when doing data presence testing, it all happens at the server, the only thing that would be pulled from server to client is the one row of table1 only.

select columnA 
from table1
where columnA not in (
select columnB
from table2)

And things shall be especially fast if your columnA and columnB happen to have an index

Things that would make a database operation slow are twofold: first is when you pull too much data from the server to client, second is when you don't have an index on pertinent fields

answered Nov 15 '22 04:11

Michael Buen

Related questions
                            
                                Combine XML from T-SQL
                            
                                Trigger is invalid and failed re-validation
                            
                                SQL Server: finding the bad data
                            
                                SQL conundrum, how to select latest date for part, but only 1 row per part (unique)
                            
                                Netezza Sql query
                            
                                Django getting executable raw sql for a QuerySet
                            
                                How to concatenate row values for use in WHERE clause of T-SQL query
                            
                                Stored procedure syntax Error(MSSQL)
                            
                                View Expansion in Oracle
                            
                                Primary key violation constraint
                            
                                Compare when value could be both NULL or text
                            
                                Drop Unique Constraint on Table Column without Knowing the Constraint Name
                            
                                How do I have to import products in Magento using a direct connection to the database
                            
                                Subquery not in performance question
                            
                                SQL sum( ) over different periods?
                            
                                Dividing a list of numbers into roughtly equal totals
                            
                                SQL change column data type with lots of data
                            
                                Does Hibernate update an entity that has not changed?
                            
                                A problem connecting to a MySQL DB using JDBC
                            
                                sql trigger inserting row into two tables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With