Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is there a problematic sort in my SQL Server delete query plan?

Tags:

sql-server

I have a very large table (150m+ rows) in SQL Server 2012 (web edition) that has no clustered index and one non-clustered index.

When I run this delete statement:

DELETE TOP(500000) 
FROM pick 
WHERE tournament_id < 157

(column name is in the non-clustered index), the execution plan produced by SQL Server looks like this:

query plan

The sort step looks problematic - it takes up 45% of the cost, and it is causing an alert saying "operator used tempdb to spill data during execution." The query is taking several minutes to run, and I feel like it should be quicker.

Two questions:

  1. Why is there a sort step in the plan?
  2. Any ideas how to overcome the spill? The server has 64gb of RAM and tempdb is sized at 8x 4gb data files.

I can definitely revisit the indexing strategy on this table if that might help.

Hope this all makes sense - thanks in advance for any tips.

like image 667
Andy W Avatar asked Jan 29 '14 20:01

Andy W


People also ask

How can reduce sort cost in execution plan in SQL Server?

For this reason, we can use indexes to eliminate the costly sort operations in the queries. However, using indexes can decrease the performance of the insert, update and delete statements and they also increase disk space usage of the database files.

How can we improve the performance of delete statement in SQL Server?

If you are deleting 95% of a table and keeping 5%, it can actually be quicker to move the rows you want to keep into a new table, drop the old table, and rename the new one. Or copy the keeper rows out, truncate the table, and then copy them back in.

Why is sorting order by expensive?

Sorting data is an expensive operation because it entails loading part or all of the data into memory and shifting that data back and forth a couple of times.


2 Answers

I agree that there seems to be no good reason for a sort here.

I don't think it is needed for Halloween protection as it doesn't show up in the = 157 version of the plan.

Also the sort operation is sorting in order of Key Asc, Bmk ASC (presumably to get them ordered sequentially in index order) but this is the order the forward index seek on the very same index is returning the rows in anyway.

One way of removing it would be to obfuscate the TOP to get a narrow (per row) rather than a wide (per index) plan.

DECLARE @N INT = 500000

DELETE TOP(@N) 
FROM pick
WHERE  tournament_id < 157 
OPTION (OPTIMIZE FOR (@N=1))

enter image description here

You'd need to test to see if this actually improved things or not.

like image 73
Martin Smith Avatar answered Nov 07 '22 20:11

Martin Smith


I would try smaller chunks and a more selective WHERE clause, as well as a way to force SQL Server to pick the TOP rows in an order you specify:

;WITH x AS
(
  SELECT TOP (10000) tournament_id
  FROM dbo.pick
  WHERE tournament_id < 157 -- AND some other where clause perhaps?
  ORDER BY tournament_id -- , AND some other ordering column
)
DELETE x;

More selective could also mean deleting tournament_id < 20, then tournament_id < 40, etc. etc. instead of picking 500000 random rows from 1-157. Typically it's better for your system overall (both in terms of blocking impact, lock escalations etc., as well as impact to the log) to perform a series of small transactions rather than one large one. I blogged about this here: http://www.sqlperformance.com/2013/03/io-subsystem/chunk-deletes

The sort may still be present in these cases (particularly if it is for Hallowe'en protection or something to do with the RID), but it may be far less problematic at a smaller scale (please don't go just based on that estimated cost % number, because often those numbers are garbage). So first I would really consider adding a clustered index. Without more requirements I don't have an explicit suggestion for you, but it could be as simple as a clustered index only on tournament_id (depending on how many potential rows you have per id) or adding an IDENTITY column which you could potentially use to help determine rows to delete in the future.

like image 39
Aaron Bertrand Avatar answered Nov 07 '22 19:11

Aaron Bertrand