Why is there a problematic sort in my SQL Server delete query plan?

Tags:

sql-server

I have a very large table (150m+ rows) in SQL Server 2012 (web edition) that has no clustered index and one non-clustered index.

When I run this delete statement:

DELETE TOP(500000) 
FROM pick 
WHERE tournament_id < 157

(column name is in the non-clustered index), the execution plan produced by SQL Server looks like this:

query plan

The sort step looks problematic - it takes up 45% of the cost, and it is causing an alert saying "operator used tempdb to spill data during execution." The query is taking several minutes to run, and I feel like it should be quicker.

Two questions:

Why is there a sort step in the plan?
Any ideas how to overcome the spill? The server has 64gb of RAM and tempdb is sized at 8x 4gb data files.

I can definitely revisit the indexing strategy on this table if that might help.

Hope this all makes sense - thanks in advance for any tips.

667

asked Jan 29 '14 20:01

Andy W

2 Answers

I agree that there seems to be no good reason for a sort here.

I don't think it is needed for Halloween protection as it doesn't show up in the = 157 version of the plan.

Also the sort operation is sorting in order of Key Asc, Bmk ASC (presumably to get them ordered sequentially in index order) but this is the order the forward index seek on the very same index is returning the rows in anyway.

One way of removing it would be to obfuscate the TOP to get a narrow (per row) rather than a wide (per index) plan.

DECLARE @N INT = 500000

DELETE TOP(@N) 
FROM pick
WHERE  tournament_id < 157 
OPTION (OPTIMIZE FOR (@N=1))

enter image description here

You'd need to test to see if this actually improved things or not.

answered Nov 07 '22 20:11

Martin Smith

I would try smaller chunks and a more selective WHERE clause, as well as a way to force SQL Server to pick the TOP rows in an order you specify:

;WITH x AS
(
  SELECT TOP (10000) tournament_id
  FROM dbo.pick
  WHERE tournament_id < 157 -- AND some other where clause perhaps?
  ORDER BY tournament_id -- , AND some other ordering column
)
DELETE x;

More selective could also mean deleting tournament_id < 20, then tournament_id < 40, etc. etc. instead of picking 500000 random rows from 1-157. Typically it's better for your system overall (both in terms of blocking impact, lock escalations etc., as well as impact to the log) to perform a series of small transactions rather than one large one. I blogged about this here: http://www.sqlperformance.com/2013/03/io-subsystem/chunk-deletes

The sort may still be present in these cases (particularly if it is for Hallowe'en protection or something to do with the RID), but it may be far less problematic at a smaller scale (please don't go just based on that estimated cost % number, because often those numbers are garbage). So first I would really consider adding a clustered index. Without more requirements I don't have an explicit suggestion for you, but it could be as simple as a clustered index only on tournament_id (depending on how many potential rows you have per id) or adding an IDENTITY column which you could potentially use to help determine rows to delete in the future.

answered Nov 07 '22 19:11

Aaron Bertrand

Related questions
                            
                                Service broker queues are disabling themselves, can't unearth the reason
                            
                                How to Calc Exponential Moving Average using SQL Server 2012 Window Functions
                            
                                Is there an equivalent of SQL_CALC_FOUND_ROWS in SQL Server?
                            
                                how clustered index implemented on view
                            
                                Converting Varchar Value to Integer/Decimal Value in SQL Server
                            
                                Adding a filter in WHERE versus FROM
                            
                                SQL Server default for Select Top 1000 Rows
                            
                                extract top 10 from results without modifying the actual query
                            
                                How to import data from one table to another table in sql
                            
                                How can I get a connection string from a text file?
                            
                                Select rows with no date range overlap
                            
                                NOT IN (subquery) producing zero rows
                            
                                Does the foreign keys automatically get updated as primary table is updated?
                            
                                Invalid object name 'dbo.CategoryIdArray'
                            
                                SQL Server : populate table in 15 minute intervals
                            
                                Using WITH in subquery (MS SQL)
                            
                                SQL conversion from varchar to uniqueidentifier fails in view
                            
                                How to Group time segments and check break time
                            
                                How do I disable The Just in Time Debugger?
                            
                                how do i pass in a command to sqsh and get the output to a file in one go?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With