Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to update 120 Million records

I need to initialize a new field with the value -1 in a 120 Million record table.

Update table        set int_field = -1; 

I let it run for 5 hours before canceling it.

I tried running it with transaction level set to read uncommitted with the same results.

Recovery Model = Simple. MS SQL Server 2005 

Any advice on getting this done faster?

like image 939
Bob Probst Avatar asked Sep 14 '10 17:09

Bob Probst


People also ask

How do you update a million records?

One of my favorite ways of dealing with millions of records in a table is processing inserts, deletes, or updates in batches. Updating data in batches of 10,000 records at a time and using a transaction is a simple and efficient way of performing updates on millions of records.

How do you update thousands of records in SQL?

DECLARE @Rows INT, @BatchSize INT; -- keep below 5000 to be safe SET @BatchSize = 2000; SET @Rows = @BatchSize; -- initialize just to enter the loop BEGIN TRY WHILE (@Rows = @BatchSize) BEGIN UPDATE TOP (@BatchSize) tab SET tab. Value = 'abc1' FROM TableName tab WHERE tab. Parameter1 = 'abc' AND tab.


1 Answers

The only sane way to update a table of 120M records is with a SELECT statement that populates a second table. You have to take care when doing this. Instructions below.


Simple Case

For a table w/out a clustered index, during a time w/out concurrent DML:

  • SELECT *, new_col = 1 INTO clone.BaseTable FROM dbo.BaseTable
  • recreate indexes, constraints, etc on new table
  • switch old and new w/ ALTER SCHEMA ... TRANSFER.
  • drop old table

If you can't create a clone schema, a different table name in the same schema will do. Remember to rename all your constraints and triggers (if applicable) after the switch.


Non-simple Case

First, recreate your BaseTable with the same name under a different schema, eg clone.BaseTable. Using a separate schema will simplify the rename process later.

  • Include the clustered index, if applicable. Remember that primary keys and unique constraints may be clustered, but not necessarily so.
  • Include identity columns and computed columns, if applicable.
  • Include your new INT column, wherever it belongs.
  • Do not include any of the following:
    • triggers
    • foreign key constraints
    • non-clustered indexes/primary keys/unique constraints
    • check constraints or default constraints. Defaults don't make much of difference, but we're trying to keep things minimal.

Then, test your insert w/ 1000 rows:

-- assuming an IDENTITY column in BaseTable SET IDENTITY_INSERT clone.BaseTable ON GO INSERT clone.BaseTable WITH (TABLOCK) (Col1, Col2, Col3) SELECT TOP 1000 Col1, Col2, Col3 = -1 FROM dbo.BaseTable GO SET IDENTITY_INSERT clone.BaseTable OFF 

Examine the results. If everything appears in order:

  • truncate the clone table
  • make sure the database in in bulk-logged or simple recovery model
  • perform the full insert.

This will take a while, but not nearly as long as an update. Once it completes, check the data in the clone table to make sure it everything is correct.

Then, recreate all non-clustered primary keys/unique constraints/indexes and foreign key constraints (in that order). Recreate default and check constraints, if applicable. Recreate all triggers. Recreate each constraint, index or trigger in a separate batch. eg:

ALTER TABLE clone.BaseTable ADD CONSTRAINT UQ_BaseTable UNIQUE (Col2) GO -- next constraint/index/trigger definition here 

Finally, move dbo.BaseTable to a backup schema and clone.BaseTable to the dbo schema (or wherever your table is supposed to live).

-- -- perform first true-up operation here, if necessary -- EXEC clone.BaseTable_TrueUp -- GO -- -- create a backup schema, if necessary -- CREATE SCHEMA backup_20100914 -- GO BEGIN TRY   BEGIN TRANSACTION   ALTER SCHEMA backup_20100914 TRANSFER dbo.BaseTable   -- -- perform second true-up operation here, if necessary   -- EXEC clone.BaseTable_TrueUp   ALTER SCHEMA dbo TRANSFER clone.BaseTable   COMMIT TRANSACTION END TRY BEGIN CATCH   SELECT ERROR_MESSAGE() -- add more info here if necessary   ROLLBACK TRANSACTION END CATCH GO 

If you need to free-up disk space, you may drop your original table at this time, though it may be prudent to keep it around a while longer.

Needless to say, this is ideally an offline operation. If you have people modifying data while you perform this operation, you will have to perform a true-up operation with the schema switch. I recommend creating a trigger on dbo.BaseTable to log all DML to a separate table. Enable this trigger before you start the insert. Then in the same transaction that you perform the schema transfer, use the log table to perform a true-up. Test this first on a subset of the data! Deltas are easy to screw up.

like image 87
Peter Radocchia Avatar answered Oct 09 '22 01:10

Peter Radocchia