Why is my Azure SQL Database delete query performance so slow?

Tags:

I have an Azure Sql database about 9GB in size. It serves a web app that handles about 135K requests per hour. Most of the data is transient, it lives in the database from a few minutes to five days and is deleted. About 10GB moves through the database per day.

I tried to run a delete query on a table to delete about 250,000 records out of 350,000 records total. About 10 percent of the records have one or two nvarchar(max) values large enough to be stored in LOB storage.

Over the weekend, I tried to delete them all at once. It ran for four hours before I canceled the query, then it was rolling back for another 8 hours - bad move. I really wasn't expecting it to be that bad.

Then I tried another approach. This batch ran at night when the web app was handling about 100K requests per hour. tblJobs Id field is a uniqueidentifier that is the primary key.

insert @tableIds select Id from dbo.tblJobs with(nolock) 
where (datediff(day, SchedDate, getDate()) > 60)  
   or (datediff(day, ModifiedDate, getDate()) > 3 and ToBeRemoved = 1)

set @maintLogStr = 'uspMaintenance [tblJobs] Obsolete J records count @tableIds: ' + convert(nvarchar(12), (select count(1) from @tableIds))
insert dbo.admin_MaintenanceLog(LogEntry) values(@maintLogStr)

set @maintLogId = newid()
set @maintLogStr = 'uspMaintenance [tblJobs] Obsolete J records beginning loop...'
insert dbo.admin_MaintenanceLog(Id, LogEntry) values(@maintLogId, @maintLogStr)

while exists(select * from @tableIds)
begin
    delete @tableIdsTmp
    begin transaction
        insert @tableIdsTmp select top 1000 id from @tableIds
        delete p from @tableIdsTmp i join dbo.tblJobs p on i.id = p.Id
        delete x from @tableIdsTmp t join @tableIds x on t.id = x.id
        set @maintLogStr = 'uspMaintenance [tblJobs] Obsolete J records remaining count @tableIds: ' + convert(nvarchar(12), (select count(1) from @tableIds))
        update dbo.admin_MaintenanceLog set LogEntry = @maintLogStr, RecordCreated = getdate() where Id = @maintLogId
    commit transaction
    if @dowaits = 1 WAITFOR DELAY '00:00:01.000'
end

SchedDate, ModifiedDate and ToBeRemoved are not indexed so gathering the Ids in @tableIds took about 3 minutes - not bad.

Then from the log entries, it took 1 hour 55 minutes to delete 11,000 records from tblJobs at which time the job called from a remote machine timed out.

Why is it taking so long? What can I do to speed it up?

291

asked Sep 28 '15 18:09

RJBreneman

1 Answers

A lot of your performance will be tied to the reservation size you are using (as mentioned in the prior answers). However, you don't need to do the table variables in your code at all to achieve what you want. In fact, you should pretty much never use them when there is a join involved at all because they have no statistics on them (and thus will likely have poor plan choices when there are complex choices for the optimizer to make). You can read the official guidance on that here:table variables documentation.

So, if you step back and look at the core of what you are trying to do, you can do this: delete top(1000) dbo.TblJobs where (datediff(day, SchedDate, getDate()) > 60)
or (datediff(day, ModifiedDate, getDate()) > 3 and ToBeRemoved = 1)

You will likely get a table scan from this query because:

You are using a disjunction (OR) which makes it hard for the optimizer to find a single access path to retrieve your results quickly.
You have used a guid as your key (I think) - effectively generating ids randomly across the space of possible guids
putting predicates on the output of intrinsic functions makes it hard(er) for the optimizer to determine how to do smarter scans on indexes where it can set a range on the column.

When you do the scan, then you are potentially going to get locking issues since you have a workload that is running concurrently on the table. So, if some other request is doing a select statement you might block the update query as it scans through the table. (Posting query plans is really helpful to discussion scaling/concurrency issues, btw).

Furthermore, given that you have a loop where you are taking 1000 rows out of the table, copying them into a table variable, then eventually copying them into yet another and joining with the original table in the delete, you are turning a problem that was O(N) into O(N^2). Algorithmically, your query will likely get slower and slower the more rows you add into your table with this approach.

There are some things you can do to improve this query (potentially):

remove the table variables completely and use a loop with @@rowcount to determine if you updated anything
remove the logging from the same database (it competes for IO and you are already constrained there)
split the query predicates into two queries (where each part of the disjunction is in a separate query). This gives you a better chance to scan an index if you happen to have one on scheddate or modifieddate.
I'm not necessarily suggesting adding indexes on those two fields (as there are potential concurrency issues to do so), but you might try it as an experiment if you can do so safely without impacting your production workload.
Once you have done the change to split the query into the 2 queries, then consider changing the computation of the datediff to be outside the query - compute that once and pass in a value as a parameter (col < @param). This will let the optimizer match indexes if you have them
If you know something about the lifetime of the objects, you can potentially switch to using newsequentialid instead of newid (or just move to bigint) to get rid of the randomness in the creation of the field for the id. this will reduce b-tree fragmentation on the insert path and potentially open up more opportunities in the delete path (as scanning older values will likely be easier to do if you have a clustered index on the id and they are not being accessed by the other users since they are touching newer data, most likely).
you can use the readpast option to skip rows that are locked by other users - for this kind of pattern you would be mostly happy to do this unless they are all locked since you might end your loop early, but if you are running this cleanup regularly it should be ok. You can read about that hint here:readpast hint docs

Most performance tuning and analysis is aided by an understanding of the cost of each operation. Using "set statistics time on" and "set statistics io on" gives you good metrics to track the physical costs of the query. "set statistics profile on" is better for looking at the algorithmic cost to each query operator (for that N^2 problem).

Better late than never, but I hope this helps you (and others) to understand how to improve your SQL Azure performance if you hit a similar situation in the future.

answered Sep 19 '22 03:09

Conor Cunningham MSFT

Related questions
                            
                                += in SELECT clause; constant vs column [duplicate]
                            
                                Graph problems: connect by NOCYCLE prior replacement in SQL server?
                            
                                TSQL query returning double the rows with inner join of same table twice
                            
                                Understanding parallel TSQL connections
                            
                                SELECT SQL Variable - should i avoid using this syntax and always use SET?
                            
                                SQL update working not insert
                            
                                Sql server 2012 fetch vs old row_number performance. What am I missing? Why is row_number 17x faster?
                            
                                how to generate insert data script from #temp tables?
                            
                                Check constraint does not work on bulk insert for more than 250 records
                            
                                Combining two queries together to form month count
                            
                                How to "Implode" (de-normalize/concat) multiple columns into a single column?
                            
                                SQL Case statement VS Union All performance
                            
                                How to group rows by their DATEDIFF?
                            
                                Why DATEADD slows down SQL query?
                            
                                Splitting dates into intervals using Start Date and End Date
                            
                                T-SQL Decimal Multiplication
                            
                                "Incorrect syntax" error using OVER()
                            
                                T-SQL. Text of procedure call
                            
                                Outer query is running before inner query
                            
                                Implement a ring buffer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is my Azure SQL Database delete query performance so slow?

Tags:

tsql

azure-sql-database

azure

RJBreneman

People also ask

1 Answers

Conor Cunningham MSFT

Recent Activity

Donate For Us