Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to update 40 million rows in batch

Basically I need to run this on a table with 40 million rows, updating every row at once will crash, so I want to batch the query so that if it crash, it can re-run the query and it would skip the finished batch and just continue with the ones left over.

UPDATE [table]
SET [New_ID] = [Old_ID]

What is the fastest way to do this? Here is how the table is created:

CREATE TABLE [table](
    [INSTANCE_ID] [int] NOT NULL,
    [table_ID] [bigint] IDENTITY(1,1) NOT NULL,
    [old_ID] [bigint] NOT NULL,
    [new_ID] [bigint] NOT NULL,
    [owner_ID] [int] NOT NULL,
    [created_time] [datetime] NULL
) ON [PRIMARY]

There are also indexes on created_time, owner_ID.

EDIT: My update statement is EXACTLY as shown, I literally just need to copy every entry in old_id into new_id for 40 million rows.

like image 570
Bill Software Engineer Avatar asked Sep 09 '16 20:09

Bill Software Engineer


3 Answers

Declare @Rowcount INT = 1;

WHILE (@Rowcount > 0)   
BEGIN
        UPDATE TOP (100000) [table]   --<-- define Batch Size in TOP Clause
           SET [New_ID] = [Old_ID]
        WHERE [New_ID] <> [Old_ID]

        SET @Rowcount = @@ROWCOUNT;

       CHECKPOINT;   --<-- to commit the changes with each batch
END
like image 156
M.Ali Avatar answered Nov 13 '22 21:11

M.Ali


M.Ali's suggestion will work, but you will end up with degrading performance as you work through the 40M records. I would suggest a better filter to find the records to update in each pass. This would assume you have a primary key (or other index) on your identity column:

DECLARE @Rowcount INT = 1
    ,   @BatchSize INT = 100000
    ,   @StartingRecord BIGINT = 1;

WHILE (@Rowcount > 0)   
BEGIN
    UPDATE [table]
        SET [New_ID] = [Old_ID]
    WHERE [table_ID] BETWEEN @StartingRecord AND @StartingRecord + @BatchSize - 1;

    SET @Rowcount = @@ROWCOUNT;

    CHECKPOINT;

    SELECT @StartingRecord += @BatchSize
END

This approach will allow each iteration to be as fast as the first. And if you don't have a valid index you need to fix that first.

like image 6
btberry Avatar answered Nov 13 '22 21:11

btberry


Select 1;  -- this will set a rowcount
WHILE (@@Rowcount > 0)   
BEGIN
  UPDATE TOP (1000000) [table]   
    SET [New_ID] =  [Old_ID]
  WHERE [New_ID] <> [Old_ID] 
    or ([New_ID] is null and [Old_ID] is not null)
END

100000 may work better for the top.

Since NewID and OldID is not null then the is null check is not necessary.

like image 3
paparazzo Avatar answered Nov 13 '22 19:11

paparazzo