Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Two queries. Same Output. One takes 2 hours and the other 0 seconds. Why?

I have some IDs inserted into a temp table #A as follows:

SELECT DISTINCT ID
INTO #A
FROM LocalDB.dbo.LocalTable1
WHERE ID NOT IN (SELECT DISTINCT ID FROM LocalDB.dbo.LocalTable2)
GO

CREATE INDEX TT ON #A(ID)
GO

I am trying to obtain some information from a remote linked server using the identifiers I gathered in the previous stage:

Query 1:

SELECT ID, Desc
FROM RemoteLinkedServer.DB.dbo.RemoteTable X
WHERE ID IN (SELECT ID FROM #A)

Query 2:

SELECT ID, Desc
FROM RemoteLinkedServer.DB.dbo.RemoteTable X
INNER JOIN #A Y
ON X.ID = Y.ID

Now in the following query, what I am doing is obtain the output of the temp table, copy the rows and format them properly into a comma-separated list and manually putting it in the query.

Query 3:

SELECT ID, Desc
FROM RemoteLinkedServer.DB.dbo.RemoteTable X
WHERE ID IN (-- Put all identifiers here --)

Queries 1 and 2 take 2 hours to execute and query 3 takes 0 seconds (my temp table contains about 200 rows). I don't know what's going on and do not have permissions to check if the remote server has the relevant indexes on ID but it is simply baffling to see that a manually constructed query runs in no time indicating that there is something that is going wrong at the query optimization phase.

Any ideas on what's going wrong here or how I could speed up my query?

like image 667
Legend Avatar asked Aug 11 '12 00:08

Legend


2 Answers

Queries 1 and 2 cause ALL of the data in the RemoteTable to be pulled into your local database in order to perform the join operation. This is going to eat RAM, network bandwidth and generally be very slow while the query is executing.

Query 3 allows the remote server to filter down the results to send just those matches you want.

Basically, it boils down to who does the work. Queries 1/2 require your local DB to do it; Query 3 lets the remote one do it.

If you have a lot of data in that remote table, then you'll likely run into network congestion etc.

The best approach to querying linked servers is to construct your queries such as the remote server does all the work and just sends results back to your local one. This will optimize the amount of network, memory and disk resources required to get the data you want.

Any time you have to join across server boundaries (using a linked server) it's going to be a disaster.

like image 126
NotMe Avatar answered Oct 16 '22 16:10

NotMe


For reference, this is how I solved the problem based on @ChrisLively's suggestions:

SELECT DISTINCT ID
INTO #A
FROM LocalDB.dbo.LocalTable1
WHERE ID NOT IN (SELECT DISTINCT ID FROM LocalDB.dbo.LocalTable2)
GO

CREATE INDEX TT ON #A(ID)
GO

DECLARE @IDList VARCHAR(MAX)
SELECT @IDList=(SELECT TOP 1 REPLACE(RTRIM((
                SELECT DISTINCT CAST(ID AS VARCHAR(MAX)) + ' ' 
                FROM #A AS InnerTable               
                FOR XML PATH (''))),' ',', '))
FROM #A AS OuterResults


DECLARE @sql AS varchar(MAX)
SET @sql = 'SELECT * FROM RemoteLinkedServer.RemoteDB.dbo.RemoteTable X WHERE ID IN (' + @IDList + ')'

EXEC (@sql)

DROP TABLE #A
GO
like image 23
Legend Avatar answered Oct 16 '22 17:10

Legend