Why is INTERSECT as slow as a nested JOIN?

Q: How is INTERSECT difference from inner join?

They are very different, even in your case. The INNER JOIN will return duplicates, if id is duplicated in either table. INTERSECT removes duplicates. The INNER JOIN will never return NULL , but INTERSECT will return NULL .

Q: Are table joins slow?

Joins can be slower than avoiding them through de-normalisation but if used correctly (joining on columns with appropriate indexes an so on) they are not inherently slow.

Q: Are joins fast?

The advantage of a join includes that it executes faster. The retrieval time of the query using joins almost always will be faster than that of a subquery. By using joins, you can maximize the calculation burden on the database i.e., instead of multiple queries using one join query.

Q: What can I use instead of INTERSECT in SQL?

Although there is no INTERSECT operator in MySQL, you can easily simulate this type of query using either the IN clause or the EXISTS clause, depending on the complexity of the INTERSECT query. First, let's explain what an INTERSECT query is. An INTERSECT query returns the intersection of 2 or more datasets.

Tags:

algorithm

sql

join

intersect

query-optimization

I'm using MS SQL.

I have a huge table with indices to make this query fast:

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010

It returns in less than 1 second. The table has billions of rows. There are only around 10000 results.

I would expect this query to also complete in about a second:

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010'

intersect

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 40652 and
IncrementalStatistics.Created > '12/2/2010'

intersect

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 14403 and
IncrementalStatistics.Created > '12/2/2010'

But it takes 20 seconds. All the individual queries take < 1 second and return around 10k results.

I would expect SQL internally to throw the results from each of these subqueries into a hashtable and do a hash-intersection - should be O(n). The result sets are big enough to fit in memory, so I doubt it's an IO issue.

I wrote an alternate query that is just a series of nested JOINs and this also takes around 20 seconds, which makes sense.

Why is INTERSECT so slow? Does it reduce to a JOIN at an early stage of the query processing?

628

asked Dec 06 '10 22:12

John Shedletsky

1 Answers

Give this a try instead. Untested obviously, but I think it will get you the results you want.

select userid 
    from IncrementalStatistics 
    where IncrementalStatisticsTypeID = 5 
        and IncrementalStatistics.AssociatedPlaceID in (47828,40652,14403)  
        and IncrementalStatistics.Created > '12/2/2010'
    group by userid
    having count(distinct IncrementalStatistics.AssociatedPlaceID) = 3

113

answered Oct 23 '22 09:10

Joe Stefanelli

Related questions
                            
                                How do I make schema changes to a mirrored database?
                            
                                Is there a way to return multiple results with a subquery?
                            
                                Hibernate subquery
                            
                                How to know where AS keyword should be used?
                            
                                Proving SQL Injection
                            
                                row order when inserting multplie rows in MySQL
                            
                                SQL Server RowVersion
                            
                                user activity database structure
                            
                                What is best approach to get sql data from C#
                            
                                Hierarchical Database, multiple tables or column with parent id?
                            
                                Is htmlencoding a suitable solution to avoiding SQL injection attacks?
                            
                                LINQ - Contains with anonymous type
                            
                                Autoincrement Primary key in Oracle database
                            
                                Can someone tell me what this means WriteLine("{0,-12}")
                            
                                injection attack (I thought I was protected!) <?php /**/eval(base64_decode( everywhere
                            
                                ms-access: difference between control source and row source
                            
                                should i write an access front end or c# front end?
                            
                                SQL Query - Ensure a row exists for each value in ()
                            
                                Storing html code in the sql database problem
                            
                                Return value from MySQL stored procedure

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With