Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting large number of rows from SQL Server - in an efficient and non-locking manner

I am writing a procedure to delete all rows from a few tables over n days old.

A dead simple query is easy to write

DELETE FROM [myTable] 
WHERE [Created] < GETDATE()-30

One problem is there is no index on the date field - I could add one, but I was working around it by doing something like:

SELECT @var = MAX([ID]) FROM myTable WHERE Created < GETDATE()-30; 
DELETE FROM myTable WHERE ID < @var

Does that seem like an acceptable method?

The problem is the table is huge, and this query will be deleting likely hundreds of thousands of rows every run.

Running it on a (slightly slow) test server it is taking an hour or so, and killing the table from other processes trying to read/write to it.

I don't so much mind it taking a while to run (though quicker is better) - but I can not have it locking the table for an hour while it is running, as there are constant read/writes going on (mainly writes).

My DB knowledge is quite basic, as I'm a coder not a dba.

Can someone give me a decent method for performing this task - in the most efficient way possible.

like image 874
jb. Avatar asked May 05 '11 21:05

jb.


People also ask

How can I DELETE 1000 rows limit in SQL Server?

Use TRUNCATE TABLE if all rows are to be deleted from the table.

Which of the below DELETE option is efficient and safe for record deletion?

TRUNCATE. TRUNCATE is a statement that will essentially remove all records from the table, just as if you had used DELETE without a WHERE clause. This means TRUNCATE will remove all records in your table, but its structure will remain intact.


2 Answers

What you're looking for is a partitioned based sliding window: How to Implement an Automatic Sliding Window in a Partitioned Table on SQL Server 2005. Partition the table by day and you can efficiently drop an entire day in a single partition switch operation at midnight. Partition switch is basically instantaneous.

If you want a solution with slightly lower overhead (partitioning has serious consequences and ripples through the entire application, specially when the indexes have to be aligned, which is a requirement for fast switch operations), then you have to design your schema in accordance with this operation. With 99.99% confidence I can say that the leftmost clustered key of your myTable must be the Created field. This would allow efficient batch deletes (delete top (2500) from myTable where Created < ...). There are many reasons why you want this to be batched (top 2500 or so at a time), most important being that you must avoid lock escalation and you must keep the size of any individual transaction within reasonable limits.

like image 109
Remus Rusanu Avatar answered Sep 30 '22 08:09

Remus Rusanu


Your method will suffer from the same malady as a normal delete would - you don't have an index on [Created]. Therefore your method is just more convoluted.

I would suggest that you create said index and try the normal delete on your test server.

Another suggestion - run this outside of normal business hours through a scheduler.

like image 28
Otávio Décio Avatar answered Sep 30 '22 10:09

Otávio Décio