I am writing a procedure to delete all rows from a few tables over n days old.
A dead simple query is easy to write
DELETE FROM [myTable]
WHERE [Created] < GETDATE()-30
One problem is there is no index on the date field - I could add one, but I was working around it by doing something like:
SELECT @var = MAX([ID]) FROM myTable WHERE Created < GETDATE()-30;
DELETE FROM myTable WHERE ID < @var
Does that seem like an acceptable method?
The problem is the table is huge, and this query will be deleting likely hundreds of thousands of rows every run.
Running it on a (slightly slow) test server it is taking an hour or so, and killing the table from other processes trying to read/write to it.
I don't so much mind it taking a while to run (though quicker is better) - but I can not have it locking the table for an hour while it is running, as there are constant read/writes going on (mainly writes).
My DB knowledge is quite basic, as I'm a coder not a dba.
Can someone give me a decent method for performing this task - in the most efficient way possible.
Use TRUNCATE TABLE if all rows are to be deleted from the table.
TRUNCATE. TRUNCATE is a statement that will essentially remove all records from the table, just as if you had used DELETE without a WHERE clause. This means TRUNCATE will remove all records in your table, but its structure will remain intact.
What you're looking for is a partitioned based sliding window: How to Implement an Automatic Sliding Window in a Partitioned Table on SQL Server 2005. Partition the table by day and you can efficiently drop an entire day in a single partition switch operation at midnight. Partition switch is basically instantaneous.
If you want a solution with slightly lower overhead (partitioning has serious consequences and ripples through the entire application, specially when the indexes have to be aligned, which is a requirement for fast switch operations), then you have to design your schema in accordance with this operation. With 99.99% confidence I can say that the leftmost clustered key of your myTable
must be the Created
field. This would allow efficient batch deletes (delete top (2500) from myTable where Created < ...
). There are many reasons why you want this to be batched (top 2500 or so at a time), most important being that you must avoid lock escalation and you must keep the size of any individual transaction within reasonable limits.
Your method will suffer from the same malady as a normal delete would - you don't have an index on [Created]. Therefore your method is just more convoluted.
I would suggest that you create said index and try the normal delete on your test server.
Another suggestion - run this outside of normal business hours through a scheduler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With