Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server - READPAST, UPDLOCK update method?

We're in need of yet another massive update which as it is, would require downtime because of the risk of extensive locking problems. Basically, we'd like to update hundreds of millions of rows during business hours.

Now, reducing the updates to manageable < 5000 batch sizes helps, but I was wondering if it was feasible to create a template to only read and lock available rows, udpate them, and move on to the next batch? The idea is that this way we could patch some 95% of the data with minimal risk, after which the remaining set of data would be small enough to just update at once during a slower period while watching out for locks.

Yes, I know this sounds weird, but bear with me. How would one go about doing this?

I was thinking of something like this:

WHILE @@ROWCOUNT > 0
BEGIN
   UPDATE TOP (5000) T
   SET T.VALUE = 'ASD'
   FROM MYTABLE T
   JOIN (SELECT TOP 5000 S.ID
      FROM MYTABLE S WITH (READPAST, UPDLOCK)
      WHERE X = Y AND Z = W etc...) SRC
         ON SRC.ID = T.ID
END

Any ideas? Basically, the last thing I want is for this query to get stuck in other potentially long-running transactions, or to do the same to others in return. So what I'm looking for here is a script that will skip locked rows, update what it can with minimal risk for getting involved in locks or deadlocks, so it can be safely run for the hour or so during uptime.

like image 488
Kahn Avatar asked Oct 01 '22 04:10

Kahn


1 Answers

Just add WITH (READPAST) to the table for single-table updates:

UPDATE TOP (5000) MYTABLE WITH (READPAST)
SET VALUE = 'ASD'
WHERE X = Y AND Z = W etc...

If you are lucky enough to have a single table involved you can just add WITH (READPAST) and the UPDATE itself will add an exclusive lock on just the rows that get updated.

If there is more than one table involved, it may become more complicated. Also be very careful of the WHERE clause because that could add more load than expected - the first few batches are fine but become progressively worse if scanning the whole table is necessary to find enough rows to satisfy the TOP. You might want to consider a short timeout value for each batch.

like image 192
Ross Bradbury Avatar answered Oct 17 '22 11:10

Ross Bradbury