Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting 1 millions rows in SQL Server

Tags:

sql-server

I am working on a client's database and there is about 1 million rows that need to be deleted due to a bug in the software. Is there an efficient way to delete them besides:

DELETE FROM table_1 where condition1 = 'value' ?
like image 922
Peter Sun Avatar asked Jul 16 '14 16:07

Peter Sun


People also ask

How do you delete 10000 records in SQL?

If you need to remove 10 million rows and have 1 GB of log space available use Delete TOP(10000) From dbo. myTable (with your select clause) and keep running it till there are no more rows to delete.

Can SQL handle 1 million records?

Millions of rows is not a problem, this is what SQL databases are designed to handle, if you have a well designed schema and good indexes.

How delete large data from table in SQL Server?

If i say without loop, i can use GOTO statement for delete large amount of records using sql server. exa. like this way you can delete large amount of data with smaller size of delete.


6 Answers

Here is a structure for a batched delete as suggested above. Do not try 1M at once...

The size of the batch and the waitfor delay are obviously quite variable, and would depend on your servers capabilities, as well as your need to mitigate contention. You may need to manually delete some rows, measuring how long they take, and adjust your batch size to something your server can handle. As mentioned above, anything over 5000 can cause locking (which I was not aware of).

This would be best done after hours... but 1M rows is really not a lot for SQL to handle. If you watch your messages in SSMS, it may take a while for the print output to show, but it will after several batches, just be aware it won't update in real-time.

Edit: Added a stop time @MAXRUNTIME & @BSTOPATMAXTIME. If you set @BSTOPATMAXTIME to 1, the script will stop on it's own at the desired time, say 8:00AM. This way you can schedule it nightly to start at say midnight, and it will stop before production at 8AM.

Edit: Answer is pretty popular, so I have added the RAISERROR in lieu of PRINT per comments.

DECLARE @BATCHSIZE INT, @WAITFORVAL VARCHAR(8), @ITERATION INT, @TOTALROWS INT, @MAXRUNTIME VARCHAR(8), @BSTOPATMAXTIME BIT, @MSG VARCHAR(500)
SET DEADLOCK_PRIORITY LOW;
SET @BATCHSIZE = 4000
SET @WAITFORVAL = '00:00:10'
SET @MAXRUNTIME = '08:00:00' -- 8AM
SET @BSTOPATMAXTIME = 1 -- ENFORCE 8AM STOP TIME
SET @ITERATION = 0 -- LEAVE THIS
SET @TOTALROWS = 0 -- LEAVE THIS

WHILE @BATCHSIZE>0
BEGIN
    -- IF @BSTOPATMAXTIME = 1, THEN WE'LL STOP THE WHOLE JOB AT A SET TIME...
    IF CONVERT(VARCHAR(8),GETDATE(),108) >= @MAXRUNTIME AND @BSTOPATMAXTIME=1
    BEGIN
        RETURN
    END

    DELETE TOP(@BATCHSIZE)
    FROM SOMETABLE
    WHERE 1=2

    SET @BATCHSIZE=@@ROWCOUNT
    SET @ITERATION=@ITERATION+1
    SET @TOTALROWS=@TOTALROWS+@BATCHSIZE
    SET @MSG = 'Iteration: ' + CAST(@ITERATION AS VARCHAR) + ' Total deletes:' + CAST(@TOTALROWS AS VARCHAR)
    RAISERROR (@MSG, 0, 1) WITH NOWAIT
    WAITFOR DELAY @WAITFORVAL 
END
like image 147
Dave Cullum Avatar answered Nov 05 '22 18:11

Dave Cullum


BEGIN TRANSACTION     
    DoAgain:
    DELETE TOP (1000)
    FROM <YourTable>
    IF @@ROWCOUNT > 0
    GOTO DoAgain
COMMIT TRANSACTION
like image 22
sansalk Avatar answered Nov 05 '22 17:11

sansalk


Maybe this solution from Uri Dimant

WHILE 1 = 1
BEGIN
   DELETE TOP(2000)
   FROM Foo
   WHERE <predicate>;
   IF @@ROWCOUNT < 2000 BREAK;
END

(Link: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/b5225ca7-f16a-4b80-b64f-3576c6aa4d1f/how-to-quickly-delete-millions-of-rows?forum=transactsql)

like image 35
Tung Nguyen Avatar answered Nov 05 '22 18:11

Tung Nguyen


Here is something I have used:

  1. If the bad data is mixed in with the good-

    INSERT INTO #table 
       SELECT columns 
       FROM old_table 
       WHERE statement to exclude bad rows
    
    TRUNCATE old_table
    
    INSERT INTO old_table 
       SELECT columns FROM #table
    
like image 37
CodeMonkey Avatar answered Nov 05 '22 19:11

CodeMonkey


Not sure how good this would be but what if you do like below (provided table_1 is a stand alone table; I mean no referenced by other table)

  1. create a duplicate table of table_1 like table_1_dup

  2. insert into table_1_dup select * from table_1 where condition1 <> 'value';

  3. drop table table_1

  4. sp_rename table_1_dup table_1

like image 29
Rahul Avatar answered Nov 05 '22 17:11

Rahul


If you cannot afford to get the database out of production while repairing, do it in small batches. See also: How to efficiently delete rows while NOT using Truncate Table in a 500,000+ rows table

If you are in a hurry and need the fastest way possible:

  • take the database out of production
  • drop all non-clustered indexes and triggers
  • delete the records (or if the majority of records is bad, copy+drop+rename the table)
  • (if applicable) fix the inconsistencies caused by the fact that you dropped triggers
  • re-create the indexes and triggers
  • bring the database back in production
like image 41
Ruud Helderman Avatar answered Nov 05 '22 18:11

Ruud Helderman