Why is performance increased when moving from a derived table to a temp table solution?

Tags:

I'm reading "Dissecting SQL Server Execution Plans" from Grant Fritchey and it's helping me a lot to see why certain queries are slow.

However, I am stumped with this case where a simple rewrite performs quite a lot faster.

This is my first attempt and it takes 21 secs. It uses a derived table:

-- 21 secs
SELECT *
  FROM Table1 AS o JOIN( 
    SELECT col1
    FROM    Table1
    GROUP BY    col1
    HAVING  COUNT( * ) > 1
) AS i ON ON i.col1= o.col1

My second attempt is 3 times faster and simply moves out the derived table to a temp table. Now it's 3 times faster:

-- 7 secs
SELECT col1
INTO    #doubles
FROM    Table1
GROUP BY    col1
HAVING  COUNT( * ) > 1

SELECT *
FROM Table1 AS o JOIN #doubles AS i ON i.col1= o.col1

My main interest is into why moving from a derived table to a temp table improves performance so much, not on how to make it even faster.

I would be grateful if someone could show me how I can diagnose this issue using the (graphical) execution plan.

Xml Execution plan: https://www.sugarsync.com/pf/D6486369_1701716_16980

Edit 1

When I created statistics on the 2 columns that were specified in the group by and the optimizer started doing "the right thing", after giving up the procedure cache (don't forget that if you are a beginner!). I simplified the query in the question which was not a good simplification in retrospect. The attached sqlplan shows the 2 columns but this was not obvious.

The estimates are now a lot more accurate as is the performance which is up to par with the temp table solution. As you know the optimizer creates stats on single columns automatically (if not disabled) but 2 column statistics have to be create by the DBA.

A (non clustered) index on these 2 columns made the query perform the same but in this case a stat is just as good and it doesn't suffer the downside of index maintenance. I'm going forward with the 2 column stat and see how it performs. @Grant Do you know if the stats on an index are more reliable than that of a column stat?

Edit 2

I always follow up once a problem is solved on how a similar problem can be diagnosed faster in the future.

The problem here was that the estimated row couns were way of. The graphical execution plans shows these when you hover over a row but that's about it.

Some tools that can help:

SET STATISTICS PROFILE ON

I heard this one will become obsolete and be replaced by its XML variant but I still like the output which is in grid format. Here the big diff between columns "Rows" and "EstimateRows" would have shown the problem

External Tool: SQL Sentry Plan Explorer http://www.sqlsentry.net/

This is a nice tool especially if you are a beginner. It highlights problems

enter image description here

External Tool: SSMS Tools Pack http://www.ssmstoolspack.com/

A more general purpose tool but again directs the user to potential problems

enter image description here

Kind Regards, Tom

381

asked Feb 28 '12 14:02

buckley

1 Answers

Looking at the values for the first execution plan, it looks like it's statistics. You have an estimated number of rows at 800 and an actual of 1.2 million. I think you'll find that updating the statistics will change the way the first query's plan is generated.

157

answered Sep 29 '22 00:09

Grant Fritchey

Related questions
                            
                                UPDATE statement uses EXEC to execute a stored procedure to SET a column value
                            
                                How to Sort in .NET Same as a SQL Server Collation?
                            
                                How to change Instance name in SQL Server
                            
                                SQL recursive query
                            
                                SQL Server vs MySQL: CONTAINS(*,'FORMSOF(THESAURUS,word)')
                            
                                T-SQL query to get Index fragmentation information
                            
                                can i cleanup buffer for some specified database instead of the entire sql server
                            
                                Two separate instances of SQL Server running a different explain plan
                            
                                Get SQL Server to use index seek + key lookup instead of clustered index scan, without WITH (FORCESEEK)
                            
                                convert a flat database resultset into hierarchical object collection in C#
                            
                                Dealing with UTC, TimeZone and making GROUP BY on that with LocalTime
                            
                                SQL Server best method to match word phrases and order relevence
                            
                                Advanced Gantt Chart for SSRS2008
                            
                                Is there anything like Parallel CURSOR?
                            
                                SQL Server varbinary(max) to Image data type
                            
                                what's the Risk of "Ad Hoc Distributed Queries"
                            
                                How to encrypt and decrypt highly sensitive information in SQL Server database with ASP Classic?
                            
                                is JTDS driver outdated?
                            
                                Dynamic query optimization
                            
                                Declaring SQL variables - SQL Server

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is performance increased when moving from a derived table to a temp table solution?

Tags:

sql-server

sql-execution-plan

database-performance

buckley

People also ask

1 Answers

Grant Fritchey

Recent Activity

Donate For Us