I have this query ...which runs extremely slowly (almost a minute): <pre class="prettyprint"><code>select distinct main.PrimeId from PRIME main join ( select distinct p.PrimeId from PRIME p left outer join ATTRGROUP a on p.PrimeId = a.PrimeId or p.PrimeId = a.RelatedPrimeId where a.PrimeId is not null and a.RelatedPrimeId is not null ) mem on main.PrimeId = mem.PrimeId </code></pre> The PRIME table has 18k rows, and has PK on PrimeId. The ATTRGROUP table has 24k rows, and has a composite PK on PrimeId, col2, then RelatedPrimeId, and then cols 4-7. There's also a separate index on RelatedPrimeId. The query eventually returns 8.5k rows - distinct values of PrimeId on the PRIME table that match either PrimeId or RelatedPrimeId on the ATTRGROUP table I have the identical query, using ATTRADDRESS instead of ATTRGROUP. ATTRADDRESS has an identical key and index structure as ATTRGROUP. It has only 11k rows on it, which is smaller, admittedly, but in that case, the query runs in about a second, and returns 11k rows. So my question is this: How can the query be so much slower on one table than another, despite the structures being identical. So far, I've tried this on SQL 2005, and (using the same database, upgraded) SQL 2008 R2. Two of us have independently obtained the same results, restoring the same backup to two different computers. Other details: <ul> <li>the bit inside the brackets runs in less than a second, even in the slow query</li> <li>there's a possible clue in the execution plan, which I don't understand. Here's part of it, with a suspicious 320,000,000 row operation:</li> </ul> <img src="https://i.stack.imgur.com/4hYLr.png" alt="enter image description here"><img src="https://i.stack.imgur.com/P4vJo.png" alt="enter image description here"> However, the actual number of rows on that table is a little over 24k, not 320M ! If I refactor the part of the query inside the brackets, so that it uses a UNION rather than an OR, thus: <pre class="prettyprint"><code>select distinct main.PrimeId from PRIME main join ( select distinct p.PrimeId from PRIME p left outer join ATTRGROUP a on p.PrimeId = a.PrimeId where a.PrimeId is not null and a.RelatedPrimeId is not null UNION select distinct p.PrimeId from PRIME p left outer join ATTRGROUP a on p.PrimeId = a.RelatedPrimeId where a.PrimeId is not null and a.RelatedPrimeId is not null ) mem on main.PrimeId = mem.PrimeId </code></pre> ... then the slow query takes under a second. I'd greatly appreciate any insight on this! Let me know if you need any more info and I'll update the question. Thanks! By the way, I realise that in this example there's a redundant join. This can't easily be removed, since in production the whole thing is generated dynamically, and the bit in the brackets takes many different forms. <hr> Edit: I've rebuilt the indexes on ATTRGROUP, makes no significant difference. Edit 2: If I use a temporary table, thus: <pre class="prettyprint"><code>select distinct p.PrimeId into #temp from PRIME p left outer join ATTRGROUP a on p.PrimeId = a.PrimeId or p.PrimeId = a.RelatedPrimeId where a.PrimeId is not null and a.RelatedPrimeId is not null select distinct main.PrimeId from Prime main join #temp mem on main.PrimeId = mem.PrimeId </code></pre> ... then again, even with an OR in the original OUTER JOIN, it runs in less than a second. I hate temp tables like this, since it always feels like an admission of defeat, so it isn't the refactor I'll be using, but I thought it was interesting that it makes such a difference. Edit 3: Updating the stats makes no difference either. Thanks for all your suggestions so far.

In my experience its better to use two left joins rather than an OR in the JOIN clause. So instead of: <pre class="prettyprint"><code> left outer join ATTRGROUP a on p.PrimeId = a.PrimeId or p.PrimeId = a.RelatedPrimeId </code></pre> I would suggest: <pre class="prettyprint"><code> left outer join ATTRGROUP a on p.PrimeId = a.PrimeId left outer join ATTRGROUP a2 on p.PrimeId = a2.RelatedPrimeId </code></pre>

Why is one query extremely slow, yet identical query on similar table runs in the blink of an eye

Tags:

sql-server

query-performance

I have this query ...which runs extremely slowly (almost a minute):

select distinct main.PrimeId 
from PRIME main 
join   
( 
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.PrimeId   or p.PrimeId = a.RelatedPrimeId    
where a.PrimeId is not null and a.RelatedPrimeId is not null  
) mem  
on main.PrimeId = mem.PrimeId

The PRIME table has 18k rows, and has PK on PrimeId.

The ATTRGROUP table has 24k rows, and has a composite PK on PrimeId, col2, then RelatedPrimeId, and then cols 4-7. There's also a separate index on RelatedPrimeId.

The query eventually returns 8.5k rows - distinct values of PrimeId on the PRIME table that match either PrimeId or RelatedPrimeId on the ATTRGROUP table

I have the identical query, using ATTRADDRESS instead of ATTRGROUP. ATTRADDRESS has an identical key and index structure as ATTRGROUP. It has only 11k rows on it, which is smaller, admittedly, but in that case, the query runs in about a second, and returns 11k rows.

So my question is this:

How can the query be so much slower on one table than another, despite the structures being identical.

So far, I've tried this on SQL 2005, and (using the same database, upgraded) SQL 2008 R2. Two of us have independently obtained the same results, restoring the same backup to two different computers.

Other details:

the bit inside the brackets runs in less than a second, even in the slow query
there's a possible clue in the execution plan, which I don't understand. Here's part of it, with a suspicious 320,000,000 row operation:

enter image description here

However, the actual number of rows on that table is a little over 24k, not 320M !

If I refactor the part of the query inside the brackets, so that it uses a UNION rather than an OR, thus:

select distinct main.PrimeId 
from PRIME main 
join   
( 
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.PrimeId
where a.PrimeId is not null and a.RelatedPrimeId is not null  
UNION
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.RelatedPrimeId    
where a.PrimeId is not null and a.RelatedPrimeId is not null  
) mem  
on main.PrimeId = mem.PrimeId

... then the slow query takes under a second.

I'd greatly appreciate any insight on this! Let me know if you need any more info and I'll update the question. Thanks!

By the way, I realise that in this example there's a redundant join. This can't easily be removed, since in production the whole thing is generated dynamically, and the bit in the brackets takes many different forms.

Edit:

I've rebuilt the indexes on ATTRGROUP, makes no significant difference.

Edit 2:

If I use a temporary table, thus:

select distinct p.PrimeId into #temp
from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.PrimeId   or p.PrimeId = a.RelatedPrimeId    
where a.PrimeId is not null and a.RelatedPrimeId is not null  

select distinct main.PrimeId 
from Prime main join   
#temp mem  
on main.PrimeId = mem.PrimeId

... then again, even with an OR in the original OUTER JOIN, it runs in less than a second. I hate temp tables like this, since it always feels like an admission of defeat, so it isn't the refactor I'll be using, but I thought it was interesting that it makes such a difference.

Edit 3:

Updating the stats makes no difference either.

Thanks for all your suggestions so far.

863

asked Aug 12 '11 07:08

ChrisA

3 Answers

In my experience its better to use two left joins rather than an OR in the JOIN clause. So instead of:

    left  outer join ATTRGROUP a 
    on p.PrimeId = a.PrimeId   or p.PrimeId = a.RelatedPrimeId

I would suggest:

    left  outer join ATTRGROUP a 
    on p.PrimeId = a.PrimeId
    left  outer join ATTRGROUP a2
    on p.PrimeId = a2.RelatedPrimeId

answered Sep 30 '22 03:09

WillMcKill

I notice that the main-query isn't correlated with the sub-query:

select distinct main.PrimeId 
from PRIME main 
join   
( 
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.PrimeId
where *main.PrimeId = a.PrimeId*  
UNION
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.RelatedPrimeId    
where *main.PrimeId = a.PrimeId*  
) mem  
on main.PrimeId = mem.PrimeId

In this construction you don't need to use the 'is not null' clause as well (will you ever need that since a primarykey will never hold a null-value?).

I was taught to avoid OR-constructions (as is already adviced by others) but also to avoid 'is not null' or 'in valuelist' - construction. Those can mostly be replaced by an (NOT) EXISTS-clause.

answered Sep 30 '22 04:09

Daan Remmers

This is not a direct answer, but if you have FK constraints referring from ATTRGROUP.PrimeId and ATTRGROUP.RelatedPrimeId to main, then your query is equivalent to this much simpler one:

select PrimeId   from ATTRGROUP a 
union
select RelatedPrimeId from ATTRGROUP a

answered Sep 30 '22 02:09

A-K

Related questions
                            
                                How can I do a Cascading Delete with the SQL 2008 HierarchyID data type?
                            
                                SQL Error: The multi-part identifier "tableName.ColumnName" could not be bound
                            
                                Is it possible to create a Unique ID in an SQL Server View that will remain the same each time the view is called?
                            
                                MS SQL datetime precision problem
                            
                                What would be the best way to store the questions and responses for a survey where I need to keep the traffic on the database to a minimum?
                            
                                Alter stored procedure if condition is met
                            
                                SQL Server stored procedure return code oddity
                            
                                SQL Server Profiler - Evaluating Reads. What is considered 'good' or 'bad'?
                            
                                How do you merge tables with autonumber primary keys?
                            
                                SQL Server unique constraint (but only sometimes)
                            
                                Does the transaction log drive need to be as fast as the database drive?
                            
                                How to get at the database schema of a hidden DB?
                            
                                Why does SQL Server thinks a Temp Table already exists when it doesn't?
                            
                                Return value in stored procedure SQL Server
                            
                                SQL Server Lock Timeout Exceeded Deleting Records in a Loop
                            
                                Is the 'BETWEEN' function very expensive in SQL Server?
                            
                                Efficient way to get max date before a given date
                            
                                Replacing Value of Empty Node in SQL XML
                            
                                Filter on Output clause sql
                            
                                Getting a recordID in my SQL select query

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With