How do I speed up deletes from a large database table?

Tags:

Here's the problem I am trying to solve: I have recently completed a data layer re-design that allows me to load-balance my database across multiple shards. In order to keep shards balanced, I need to be able to migrate data from one shard to another, which involves copying from shard A to shard B, and then deleting the records from shard A. But I have several tables that are very big, and have many foreign keys pointed to them, so deleting a single record from the table can take more than one second.

In some cases I need to delete millions of records from the tables, and it just takes too long to be practical.

Disabling foreign keys is not an option. Deleting large batches of rows is also not an option because this is a production application and large deletes lock too many resources, causing failures. I'm using Sql Server, and I know about partitioned tables, but the restrictions on partitioning (and the license fees for enterprise edition) are so unrealistic that they are not possible.

When I began working on this problem I thought the hard part would be writing the algorithm that figures out how to delete rows from the leaf level up to the top of the data model, so that no foreign key constraints get violated along the way. But solving that problem did me no good since it takes weeks to delete records that need to disappear overnight.

I already built in a way to mark data as virtually deleted, so as far as the application is concerned, the data is gone, but I'm still dealing with large data files, large backups, and slower queries because of the sheer size of the tables.

Any ideas? I have already read older related posts here and found nothing that would help.

930

asked Jul 21 '09 12:07

Eric Z Beard

1 Answers

Please see: Optimizing Delete on SQL Server

This MS support article might be of interest: How to resolve blocking problems that are caused by lock escalation in SQL Server:

Break up large batch operations into several smaller operations. For example, suppose you ran the following query to remove several hundred thousand old records from an audit table, and then you found that it caused a lock escalation that blocked other users:
DELETE FROM LogMessages WHERE LogDate < '2/1/2002'     
By removing these records a few hundred at a time, you can dramatically reduce the number of locks that accumulate per transaction and prevent lock escalation. For example:
SET ROWCOUNT 500 delete_more:      DELETE FROM LogMessages WHERE LogDate < '2/1/2002' IF @@ROWCOUNT > 0 GOTO delete_more SET ROWCOUNT 0 
Reduce the query's lock footprint by making the query as efficient as possible. Large scans or large numbers of Bookmark Lookups may increase the chance of lock escalation; additionally, it increases the chance of deadlocks, and generally adversely affects concurrency and performance.

answered Sep 19 '22 13:09

Mitch Wheat

Related questions
                            
                                in sql server what is the difference between user_type_id and system_type_id in sys.types
                            
                                Free SQL Server in Azure
                            
                                Equivalent of MySQL ON DUPLICATE KEY UPDATE in Sql Server
                            
                                SQL to output line number in results of a query
                            
                                SQL query to return only 1 record per group ID
                            
                                LPAD in SQL Server 2008
                            
                                Enable/Disable Sql Server Agent using a t-sql script
                            
                                How to avoid Sql Query Timeout
                            
                                Using SqlDataAdapter to insert a row
                            
                                Why can't I create this sql server full text index?
                            
                                select TOP (all)
                            
                                Joining a table value function to a MSSQL query
                            
                                SQL Server Index - Any improvement for LIKE queries?
                            
                                Adding a uniqueidentifier column and adding the default to generate new guid
                            
                                Using UPDATE in stored procedure with optional parameters
                            
                                Join statement order of operation
                            
                                SQL Server: use CASE with LIKE
                            
                                To CTE or not to CTE
                            
                                converting Epoch timestamp to sql server(human readable format)
                            
                                How to restore database using sqb files in SQL SERVER

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I speed up deletes from a large database table?

Tags:

database

sql-server

scalability

sharding

Eric Z Beard

People also ask

1 Answers

Mitch Wheat

Recent Activity

Donate For Us