Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is INTERSECT as slow as a nested JOIN?

I'm using MS SQL.

I have a huge table with indices to make this query fast:

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010

It returns in less than 1 second. The table has billions of rows. There are only around 10000 results.

I would expect this query to also complete in about a second:

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010'

intersect

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 40652 and
IncrementalStatistics.Created > '12/2/2010'

intersect

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 14403 and
IncrementalStatistics.Created > '12/2/2010'

But it takes 20 seconds. All the individual queries take < 1 second and return around 10k results.

I would expect SQL internally to throw the results from each of these subqueries into a hashtable and do a hash-intersection - should be O(n). The result sets are big enough to fit in memory, so I doubt it's an IO issue.

I wrote an alternate query that is just a series of nested JOINs and this also takes around 20 seconds, which makes sense.

Why is INTERSECT so slow? Does it reduce to a JOIN at an early stage of the query processing?

like image 628
John Shedletsky Avatar asked Dec 06 '10 22:12

John Shedletsky


People also ask

How is INTERSECT difference from inner join?

They are very different, even in your case. The INNER JOIN will return duplicates, if id is duplicated in either table. INTERSECT removes duplicates. The INNER JOIN will never return NULL , but INTERSECT will return NULL .

Are table joins slow?

Joins can be slower than avoiding them through de-normalisation but if used correctly (joining on columns with appropriate indexes an so on) they are not inherently slow.

Are joins fast?

The advantage of a join includes that it executes faster. The retrieval time of the query using joins almost always will be faster than that of a subquery. By using joins, you can maximize the calculation burden on the database i.e., instead of multiple queries using one join query.

What can I use instead of INTERSECT in SQL?

Although there is no INTERSECT operator in MySQL, you can easily simulate this type of query using either the IN clause or the EXISTS clause, depending on the complexity of the INTERSECT query. First, let's explain what an INTERSECT query is. An INTERSECT query returns the intersection of 2 or more datasets.


1 Answers

Give this a try instead. Untested obviously, but I think it will get you the results you want.

select userid 
    from IncrementalStatistics 
    where IncrementalStatisticsTypeID = 5 
        and IncrementalStatistics.AssociatedPlaceID in (47828,40652,14403)  
        and IncrementalStatistics.Created > '12/2/2010'
    group by userid
    having count(distinct IncrementalStatistics.AssociatedPlaceID) = 3
like image 113
Joe Stefanelli Avatar answered Oct 23 '22 09:10

Joe Stefanelli