I'm using MS SQL.
I have a huge table with indices to make this query fast:
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010
It returns in less than 1 second. The table has billions of rows. There are only around 10000 results.
I would expect this query to also complete in about a second:
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010'
intersect
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 40652 and
IncrementalStatistics.Created > '12/2/2010'
intersect
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 14403 and
IncrementalStatistics.Created > '12/2/2010'
But it takes 20 seconds. All the individual queries take < 1 second and return around 10k results.
I would expect SQL internally to throw the results from each of these subqueries into a hashtable and do a hash-intersection - should be O(n). The result sets are big enough to fit in memory, so I doubt it's an IO issue.
I wrote an alternate query that is just a series of nested JOINs and this also takes around 20 seconds, which makes sense.
Why is INTERSECT so slow? Does it reduce to a JOIN at an early stage of the query processing?
They are very different, even in your case. The INNER JOIN will return duplicates, if id is duplicated in either table. INTERSECT removes duplicates. The INNER JOIN will never return NULL , but INTERSECT will return NULL .
Joins can be slower than avoiding them through de-normalisation but if used correctly (joining on columns with appropriate indexes an so on) they are not inherently slow.
The advantage of a join includes that it executes faster. The retrieval time of the query using joins almost always will be faster than that of a subquery. By using joins, you can maximize the calculation burden on the database i.e., instead of multiple queries using one join query.
Although there is no INTERSECT operator in MySQL, you can easily simulate this type of query using either the IN clause or the EXISTS clause, depending on the complexity of the INTERSECT query. First, let's explain what an INTERSECT query is. An INTERSECT query returns the intersection of 2 or more datasets.
Give this a try instead. Untested obviously, but I think it will get you the results you want.
select userid
from IncrementalStatistics
where IncrementalStatisticsTypeID = 5
and IncrementalStatistics.AssociatedPlaceID in (47828,40652,14403)
and IncrementalStatistics.Created > '12/2/2010'
group by userid
having count(distinct IncrementalStatistics.AssociatedPlaceID) = 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With