Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Batched Delete

I have a table in SQL Server 2005 which has approx 4 billion rows in it. I need to delete approximately 2 billion of these rows. If I try and do it in a single transaction, the transaction log fills up and it fails. I don't have any extra space to make the transaction log bigger. I assume the best way forward is to batch up the delete statements (in batches of ~ 10,000?).

I can probably do this using a cursor, but is the a standard/easy/clever way of doing this?

P.S. This table does not have an identity column as a PK. The PK is made up of an integer foreign key and a date.

like image 701
Tom Ferguson Avatar asked May 22 '09 08:05

Tom Ferguson


People also ask

How do you bulk delete in SQL?

DELETE FROM table_name WHERE column_name BETWEEN value 1 AND value 2; Another way to delete multiple rows is to use the IN operator. DELETE FROM table_name WHERE column_name IN (value 1, value 2, value 3, etc...); If you want to delete all records from the table then you can use this syntax.

What is batch DELETE?

Deleting a batch means deleting all the test results in that batch. It does not impact the baselines related to the test results in any way.

How do I DELETE multiple tables in SQL?

Alternatively, you can also hit keyboard option F7 and it will open up Object Explorer Details. In Object Explorer Details, select the tables which you want to delete and either hit the keyboard button DELETE or just go right click on the tables and select the option DELETE.


4 Answers

You can 'nibble' the delete's which also means that you don't cause a massive load on the database. If your t-log backups run every 10 mins, then you should be ok to run this once or twice over the same interval. You can schedule it as a SQL Agent job

try something like this:

DECLARE @count int
SET @count = 10000

    DELETE  FROM table1 
    WHERE table1id IN (
        SELECT TOP (@count) tableid
        FROM table1
        WHERE x='y'
    )
like image 163
Nick Kavadias Avatar answered Oct 22 '22 03:10

Nick Kavadias


In addition to putting this in a batch with a statement to truncate the log, you also might want to try these tricks:

  • Add criteria that matches the first column in your clustered index in addition to your other criteria
  • Drop any indexes from the table and then put them back after the delete is done if that's possible and won't interfere with anything else going on in the DB, but KEEP the clustered index

For the first point above, for example, if your PK is clustered then find a range which approximately matches the number of rows that you want to delete each batch and use that:

DECLARE @max_id INT, @start_id INT, @end_id INT, @interval INT
SELECT @start_id = MIN(id), @max_id = MAX(id) FROM My_Table
SET @interval = 100000  -- You need to determine the right number here
SET @end_id = @start_id + @interval

WHILE (@start_id <= @max_id)
BEGIN
     DELETE FROM My_Table WHERE id BETWEEN @start_id AND @end_id AND <your criteria>

     SET @start_id = @end_id + 1
     SET @end_id = @end_id + @interval
END
like image 22
Tom H Avatar answered Oct 22 '22 03:10

Tom H


What distinguishes the rows you want to delete from those you want to keep? Will this work for you:

while exists (select 1 from your_table where <your_condition>)
delete top(10000) from your_table
where <your_condition>
like image 23
Stanislav Kniazev Avatar answered Oct 22 '22 03:10

Stanislav Kniazev


Sounds like this is one-off operation (I hope for you) and you don't need to go back to a state that's halfway this batched delete - if that's the case why don't you just switch to SIMPLE transaction mode before running and then back to FULL when you're done?

This way the transaction log won't grow as much. This might not be ideal in most situations but I don't see anything wrong here (assuming as above you don't need to go back to a state that's in between your deletes).

you can do this in your script with smt like:

ALTER DATABASE myDB SET RECOVERY FULL/SIMPLE

Alternatively you can setup a job to shrink the transaction log every given interval of time - while your delete is running. This is kinda bad but I reckon it'd do the trick.

like image 25
JohnIdol Avatar answered Oct 22 '22 04:10

JohnIdol