I want to update a table in SQL Server by setting a FLAG column to 1 for all values since the beginning of the year:
TABLE
DATE ID FLAG (more columns...)
2016/01/01 1 0 ...
2016/01/01 2 0 ...
2016/01/02 3 0 ...
2016/01/02 4 0 ...
(etc)
Problem is that this table contains hundreds of millions of records and I've been advised to chunk the updates 100,000 rows at a time to avoid blocking other processes.
I need to remember which rows I update because there are background processes which immediately flip the FLAG back to 0 once they're done processing it.
Does anyone have suggestions on how I can do this? Each day's worth of data has over a million records, so I can't simply loop using the DATE as a counter. I am thinking of using the ID
UPDATE in Bulk It's a faster update than a row by row operation, but this is best used when updating limited rows. A bulk update is an expensive operation in terms of query cost, because it takes more resources for the single update operation. It also takes time for the update to be logged in the transaction log.
There are a couple of ways to do it. INSERT INTO students (id, score1, score2) VALUES (1, 5, 8), (2, 10, 8), (3, 8, 3), (4, 10, 7) ON DUPLICATE KEY UPDATE score1 = VALUES(score1), score2 = VALUES(score2);
Assuming the date column and the ID column are sequential you could do a simple loop. By this I mean that if there is a record id=1 and date=2016-1-1
then record id=2 date=2015-12-31
could not exist. If you are worried about locks/exceptions you should add a transaction in the WHILE
block and commit or rollback on failure.
Change the @batchSize
to whatever you feel is right after some experimentation.
DECLARE @currentId int, @maxId int, @batchSize int = 10000
SELECT @currentId = MIN(ID), @maxId = MAX(ID) FROM YOURTABLE WHERE DATE >= '2016-01-01'
WHILE @currentId < @maxId
BEGIN
UPDATE YOURTABLE SET FLAG = 1 WHERE ID BETWEEN @currentId AND (@currentId + @batchSize)
SET @currentId = @currentId + @batchSize
END
As this as the update will never flag the same record to 1 twice I do not see a need to track which records were touched unless you are going to manually stop the process partway through.
You should also ensure that the ID column has an index on it so the retrieval is fast in each update statement.
Looks like very simple question or maybe I'm missing something in it.
You can create a temp/permanent table to keep track of updated rows.
create tbl (Id int) -- or temp table based on your case
insert into tbl values (0)
declare @lastId int = (select Id from tbl)
;with cte as (
select top 100000
from YourMainTable
where Id > @lastId
ORDER BY Id
)
update cte
set Flag = 1
update tbl set Id = @lastId + 100000
You can do this process in a loop (except the table creation part)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With