Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my Azure SQL Database delete query performance so slow?

I have an Azure Sql database about 9GB in size. It serves a web app that handles about 135K requests per hour. Most of the data is transient, it lives in the database from a few minutes to five days and is deleted. About 10GB moves through the database per day.

I tried to run a delete query on a table to delete about 250,000 records out of 350,000 records total. About 10 percent of the records have one or two nvarchar(max) values large enough to be stored in LOB storage.

Over the weekend, I tried to delete them all at once. It ran for four hours before I canceled the query, then it was rolling back for another 8 hours - bad move. I really wasn't expecting it to be that bad.

Then I tried another approach. This batch ran at night when the web app was handling about 100K requests per hour. tblJobs Id field is a uniqueidentifier that is the primary key.

insert @tableIds select Id from dbo.tblJobs with(nolock) 
where (datediff(day, SchedDate, getDate()) > 60)  
   or (datediff(day, ModifiedDate, getDate()) > 3 and ToBeRemoved = 1)

set @maintLogStr = 'uspMaintenance [tblJobs] Obsolete J records count @tableIds: ' + convert(nvarchar(12), (select count(1) from @tableIds))
insert dbo.admin_MaintenanceLog(LogEntry) values(@maintLogStr)

set @maintLogId = newid()
set @maintLogStr = 'uspMaintenance [tblJobs] Obsolete J records beginning loop...'
insert dbo.admin_MaintenanceLog(Id, LogEntry) values(@maintLogId, @maintLogStr)

while exists(select * from @tableIds)
begin
    delete @tableIdsTmp
    begin transaction
        insert @tableIdsTmp select top 1000 id from @tableIds
        delete p from @tableIdsTmp i join dbo.tblJobs p on i.id = p.Id
        delete x from @tableIdsTmp t join @tableIds x on t.id = x.id
        set @maintLogStr = 'uspMaintenance [tblJobs] Obsolete J records remaining count @tableIds: ' + convert(nvarchar(12), (select count(1) from @tableIds))
        update dbo.admin_MaintenanceLog set LogEntry = @maintLogStr, RecordCreated = getdate() where Id = @maintLogId
    commit transaction
    if @dowaits = 1 WAITFOR DELAY '00:00:01.000'
end

SchedDate, ModifiedDate and ToBeRemoved are not indexed so gathering the Ids in @tableIds took about 3 minutes - not bad.

Then from the log entries, it took 1 hour 55 minutes to delete 11,000 records from tblJobs at which time the job called from a remote machine timed out.

Why is it taking so long? What can I do to speed it up?

like image 291
RJBreneman Avatar asked Sep 28 '15 18:09

RJBreneman


People also ask

How can we improve the performance of delete statement in SQL Server?

If you are deleting 95% of a table and keeping 5%, it can actually be quicker to move the rows you want to keep into a new table, drop the old table, and rename the new one. Or copy the keeper rows out, truncate the table, and then copy them back in.

Why is my query taking so long?

Queries can become slow for various reasons ranging from improper index usage to bugs in the storage engine itself. However, in most cases, queries become slow because developers or MySQL database administrators neglect to monitor them and keep an eye on their performance.

How can I speed up SQL query execution?

Reduce nested views to reduce lags This nesting causes too many data returns for every single query, which either makes the database crawl, or completely give up and give no returns. Minimizing nesting is a simple way to make your SQl query efficient and significantly improve speeds.


1 Answers

A lot of your performance will be tied to the reservation size you are using (as mentioned in the prior answers). However, you don't need to do the table variables in your code at all to achieve what you want. In fact, you should pretty much never use them when there is a join involved at all because they have no statistics on them (and thus will likely have poor plan choices when there are complex choices for the optimizer to make). You can read the official guidance on that here:table variables documentation.

So, if you step back and look at the core of what you are trying to do, you can do this: delete top(1000) dbo.TblJobs where (datediff(day, SchedDate, getDate()) > 60)
or (datediff(day, ModifiedDate, getDate()) > 3 and ToBeRemoved = 1)

You will likely get a table scan from this query because:

  • You are using a disjunction (OR) which makes it hard for the optimizer to find a single access path to retrieve your results quickly.
  • You have used a guid as your key (I think) - effectively generating ids randomly across the space of possible guids
  • putting predicates on the output of intrinsic functions makes it hard(er) for the optimizer to determine how to do smarter scans on indexes where it can set a range on the column.

When you do the scan, then you are potentially going to get locking issues since you have a workload that is running concurrently on the table. So, if some other request is doing a select statement you might block the update query as it scans through the table. (Posting query plans is really helpful to discussion scaling/concurrency issues, btw).

Furthermore, given that you have a loop where you are taking 1000 rows out of the table, copying them into a table variable, then eventually copying them into yet another and joining with the original table in the delete, you are turning a problem that was O(N) into O(N^2). Algorithmically, your query will likely get slower and slower the more rows you add into your table with this approach.

There are some things you can do to improve this query (potentially):

  • remove the table variables completely and use a loop with @@rowcount to determine if you updated anything
  • remove the logging from the same database (it competes for IO and you are already constrained there)
  • split the query predicates into two queries (where each part of the disjunction is in a separate query). This gives you a better chance to scan an index if you happen to have one on scheddate or modifieddate.
  • I'm not necessarily suggesting adding indexes on those two fields (as there are potential concurrency issues to do so), but you might try it as an experiment if you can do so safely without impacting your production workload.
  • Once you have done the change to split the query into the 2 queries, then consider changing the computation of the datediff to be outside the query - compute that once and pass in a value as a parameter (col < @param). This will let the optimizer match indexes if you have them
  • If you know something about the lifetime of the objects, you can potentially switch to using newsequentialid instead of newid (or just move to bigint) to get rid of the randomness in the creation of the field for the id. this will reduce b-tree fragmentation on the insert path and potentially open up more opportunities in the delete path (as scanning older values will likely be easier to do if you have a clustered index on the id and they are not being accessed by the other users since they are touching newer data, most likely).
  • you can use the readpast option to skip rows that are locked by other users - for this kind of pattern you would be mostly happy to do this unless they are all locked since you might end your loop early, but if you are running this cleanup regularly it should be ok. You can read about that hint here:readpast hint docs

Most performance tuning and analysis is aided by an understanding of the cost of each operation. Using "set statistics time on" and "set statistics io on" gives you good metrics to track the physical costs of the query. "set statistics profile on" is better for looking at the algorithmic cost to each query operator (for that N^2 problem).

Better late than never, but I hope this helps you (and others) to understand how to improve your SQL Azure performance if you hit a similar situation in the future.

like image 63
Conor Cunningham MSFT Avatar answered Sep 19 '22 03:09

Conor Cunningham MSFT