Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to chunk updates to SQL Server?

Tags:

sql-server

I want to update a table in SQL Server by setting a FLAG column to 1 for all values since the beginning of the year:

TABLE
DATE        ID     FLAG   (more columns...)
2016/01/01  1      0      ...
2016/01/01  2      0      ...
2016/01/02  3      0      ...
2016/01/02  4      0      ...
(etc)

Problem is that this table contains hundreds of millions of records and I've been advised to chunk the updates 100,000 rows at a time to avoid blocking other processes.

I need to remember which rows I update because there are background processes which immediately flip the FLAG back to 0 once they're done processing it.

Does anyone have suggestions on how I can do this? Each day's worth of data has over a million records, so I can't simply loop using the DATE as a counter. I am thinking of using the ID

like image 680
Jin Kim Avatar asked Jun 09 '16 18:06

Jin Kim


People also ask

Can we do bulk update in SQL?

UPDATE in Bulk It's a faster update than a row by row operation, but this is best used when updating limited rows. A bulk update is an expensive operation in terms of query cost, because it takes more resources for the single update operation. It also takes time for the update to be logged in the transaction log.

How do I update multiple rows in SQL with update?

There are a couple of ways to do it. INSERT INTO students (id, score1, score2) VALUES (1, 5, 8), (2, 10, 8), (3, 8, 3), (4, 10, 7) ON DUPLICATE KEY UPDATE score1 = VALUES(score1), score2 = VALUES(score2);


2 Answers

Assuming the date column and the ID column are sequential you could do a simple loop. By this I mean that if there is a record id=1 and date=2016-1-1 then record id=2 date=2015-12-31 could not exist. If you are worried about locks/exceptions you should add a transaction in the WHILE block and commit or rollback on failure.

Change the @batchSize to whatever you feel is right after some experimentation.

DECLARE @currentId int, @maxId int, @batchSize int = 10000

SELECT @currentId = MIN(ID), @maxId = MAX(ID) FROM YOURTABLE WHERE DATE >= '2016-01-01'

WHILE @currentId < @maxId
BEGIN
    UPDATE YOURTABLE SET FLAG = 1 WHERE ID BETWEEN @currentId AND (@currentId + @batchSize)
    SET @currentId = @currentId + @batchSize
END

As this as the update will never flag the same record to 1 twice I do not see a need to track which records were touched unless you are going to manually stop the process partway through.

You should also ensure that the ID column has an index on it so the retrieval is fast in each update statement.

like image 134
Igor Avatar answered Nov 15 '22 03:11

Igor


Looks like very simple question or maybe I'm missing something in it.

You can create a temp/permanent table to keep track of updated rows.

create tbl (Id int) -- or temp table based on your case
insert into tbl values (0)

declare @lastId int = (select Id from tbl)

;with cte as (
    select top 100000 
    from YourMainTable
    where Id > @lastId
    ORDER BY Id
)
update cte 
set Flag = 1

update tbl set Id = @lastId + 100000

You can do this process in a loop (except the table creation part)

like image 24
FLICKER Avatar answered Nov 15 '22 04:11

FLICKER