I have a very large table (150m+ rows) in SQL Server 2012 (web edition) that has no clustered index and one non-clustered index.
When I run this delete statement:
DELETE TOP(500000)
FROM pick
WHERE tournament_id < 157
(column name is in the non-clustered index), the execution plan produced by SQL Server looks like this:
The sort step looks problematic - it takes up 45% of the cost, and it is causing an alert saying "operator used tempdb to spill data during execution." The query is taking several minutes to run, and I feel like it should be quicker.
Two questions:
I can definitely revisit the indexing strategy on this table if that might help.
Hope this all makes sense - thanks in advance for any tips.
For this reason, we can use indexes to eliminate the costly sort operations in the queries. However, using indexes can decrease the performance of the insert, update and delete statements and they also increase disk space usage of the database files.
If you are deleting 95% of a table and keeping 5%, it can actually be quicker to move the rows you want to keep into a new table, drop the old table, and rename the new one. Or copy the keeper rows out, truncate the table, and then copy them back in.
Sorting data is an expensive operation because it entails loading part or all of the data into memory and shifting that data back and forth a couple of times.
I agree that there seems to be no good reason for a sort here.
I don't think it is needed for Halloween protection as it doesn't show up in the = 157
version of the plan.
Also the sort operation is sorting in order of Key Asc, Bmk ASC
(presumably to get them ordered sequentially in index order) but this is the order the forward index seek on the very same index is returning the rows in anyway.
One way of removing it would be to obfuscate the TOP
to get a narrow (per row) rather than a wide (per index) plan.
DECLARE @N INT = 500000
DELETE TOP(@N)
FROM pick
WHERE tournament_id < 157
OPTION (OPTIMIZE FOR (@N=1))
You'd need to test to see if this actually improved things or not.
I would try smaller chunks and a more selective WHERE clause, as well as a way to force SQL Server to pick the TOP rows in an order you specify:
;WITH x AS
(
SELECT TOP (10000) tournament_id
FROM dbo.pick
WHERE tournament_id < 157 -- AND some other where clause perhaps?
ORDER BY tournament_id -- , AND some other ordering column
)
DELETE x;
More selective could also mean deleting tournament_id < 20, then tournament_id < 40, etc. etc. instead of picking 500000 random rows from 1-157. Typically it's better for your system overall (both in terms of blocking impact, lock escalations etc., as well as impact to the log) to perform a series of small transactions rather than one large one. I blogged about this here: http://www.sqlperformance.com/2013/03/io-subsystem/chunk-deletes
The sort may still be present in these cases (particularly if it is for Hallowe'en protection or something to do with the RID), but it may be far less problematic at a smaller scale (please don't go just based on that estimated cost % number, because often those numbers are garbage). So first I would really consider adding a clustered index. Without more requirements I don't have an explicit suggestion for you, but it could be as simple as a clustered index only on tournament_id (depending on how many potential rows you have per id) or adding an IDENTITY column which you could potentially use to help determine rows to delete in the future.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With