Why is subquery and join so slow

Tags:

I need to select rows from BUNDLES table which have one of several SAP_STATE_ID values. Those values depends on whether respective SAP status is supposed to be exported or not.

This query runs really fast (there is index on SAP_STATE_ID field) -

SELECT b.* FROM BUNDLES b WHERE b.SAP_STATE_ID IN (2,3,5,6)

But... I'd like to fetch list of IDs dynamically, like this:

SELECT b.* FROM BUNDLES b 
WHERE b.SAP_STATE_ID IN 
(SELECT s.SAP_STATE_ID FROM SAP_STATES s WHERE s.EXPORT_TO_SAP = 1)

And ouch, this query is suddenly taking too much time. I would expect SQL server to run the subquery first (it doesn't depend on anything from main query) and then run whole thing just like in my first example. I tried to rewrite it to use joins instead of subquery:

SELECT b.* FROM BUNDLES b 
JOIN SAP_STATES s ON (s.SAP_STATE_ID = b.SAP_STATE_ID) 
WHERE s.EXPORT_TO_SAP = 1

but it has same poor performance. It seems like it is running the subquery for each row of BUNDLES table or something like this. I am not very skilled in reading execution plans, but I tried. It says that 81% cost is for scanning Primary key index of BUNDLES (I have no idea why it should do such a thing, there is BUNDLE_ID field defined as PRIMARY KEY, but it doesn't appear in the query at all...)

Does anyone have an explanation why is SQL server so "stupid"? Is there a way to achieve what I want with good performance but without the need to provide static list of SAP_STATE_IDs?

script for both tables and relevant indexes - http://mab.to/xbYiI0wKj

execution plan for subquery version - http://mab.to/8Qh6gpdYZ

query plan for version with joins - http://mab.to/YCqeGCUbr

(for some reason these two plans looks the same and both suggest creating BUNDLES.SAP_STATE_ID index, which is already there)

602

asked Sep 30 '14 15:09

lot

2 Answers

I am pretty sure your statistics are off on the tables. If you want to get it working in a hurry I would write the query as:

SELECT b.*
  FROM SAP_STATES s 
 INNER LOOP JOIN BUNDLES b 
    ON s.SAP_STATE_ID = b.SAP_STATE_ID
 WHERE s.EXPORT_TO_SAP = 1

This forces a nested loops join over SAP_STATES which filters on BUNDLES

153

answered Oct 23 '22 13:10

Filip De Vos

When you use tables(temporary or physical), the SQL engine builds statistics against it and thus has a very clear idea on the number of rows in it and which is the best execution approach for it. On the other hand, a computed table(sub query) doesn't have statistics against it.

So while it might be seemingly simple for a human to deduce the number of rows in it, the "stupid" SQL Engine is unaware of all this. Now, coming to the query, the WHERE s.EXPORT_TO_SAP = 1 clause is making a world of difference here. The clustered index is sorted and built on the SAP_STATE_ID, but to additionally check the WHERE clause, it has no option but to scan the entire table(in the final dataset)! I bet that if instead of a clustered index, if there was a non clustered covered index on SAP_STATE_ID column which covered the EXPORT_TO_SAP field, it might have done the trick. Since clustered index scans are generally bad for performance, I would suggest you to take the below approach:

SELECT s.SAP_STATE_ID 
into #Sap_State
FROM SAP_STATES s WHERE s.EXPORT_TO_SAP = 1

SELECT b.* FROM BUNDLES b 
join #Sap_State a on a.sap_state_id = b.sap_state_id

answered Oct 23 '22 12:10

SouravA

Related questions
                            
                                Two Primary Keys
                            
                                What is MAX(DISTINCT x) in SQL?
                            
                                Postgresql 'select distinct on' in hibernate
                            
                                How to change the connection in Sql Server Data Tools Editor in Visual Studio
                            
                                EclipseLink / JPA: How to programmatically get the number of SQL queries that have been performed
                            
                                How to read data table from SQL Server stored procedure
                            
                                SUM and COUNT xPath expression doesn't work in Oracle 11.2
                            
                                Delete query not working in mysql
                            
                                Change NLS Character set parameters on Oracle 11g XE
                            
                                How to generate SQL schema from Perl DBIx::Class files?
                            
                                How to filter a sqlalchemy query by a column in latest child item
                            
                                Arithmetic overflow error for data type tinyint, value = -1
                            
                                SQL ORDER BY within OVER clause incompatible with CLR aggregation?
                            
                                Very weird PreparedStatement in java?
                            
                                How to Set SQLCommandTimeout in App.config
                            
                                SQL query on H2 database table throws ArrayIndexOutOfBoundsException
                            
                                SQL Database Design, recursive parent-child relationship?
                            
                                Firebird how to select ids that match all items in a set
                            
                                SQL Server SELECT paging with JOIN
                            
                                How to avoid too many joins?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is subquery and join so slow

Tags:

sql

join

sql-server

lot

People also ask

2 Answers

Filip De Vos

SouravA

Recent Activity

Donate For Us