I need to initialize a new field with the value -1 in a 120 Million record table.
Update table set int_field = -1;
I let it run for 5 hours before canceling it.
I tried running it with transaction level set to read uncommitted with the same results.
Recovery Model = Simple. MS SQL Server 2005
Any advice on getting this done faster?
One of my favorite ways of dealing with millions of records in a table is processing inserts, deletes, or updates in batches. Updating data in batches of 10,000 records at a time and using a transaction is a simple and efficient way of performing updates on millions of records.
DECLARE @Rows INT, @BatchSize INT; -- keep below 5000 to be safe SET @BatchSize = 2000; SET @Rows = @BatchSize; -- initialize just to enter the loop BEGIN TRY WHILE (@Rows = @BatchSize) BEGIN UPDATE TOP (@BatchSize) tab SET tab. Value = 'abc1' FROM TableName tab WHERE tab. Parameter1 = 'abc' AND tab.
The only sane way to update a table of 120M records is with a SELECT
statement that populates a second table. You have to take care when doing this. Instructions below.
Simple Case
For a table w/out a clustered index, during a time w/out concurrent DML:
SELECT *, new_col = 1 INTO clone.BaseTable FROM dbo.BaseTable
If you can't create a clone schema, a different table name in the same schema will do. Remember to rename all your constraints and triggers (if applicable) after the switch.
Non-simple Case
First, recreate your BaseTable
with the same name under a different schema, eg clone.BaseTable
. Using a separate schema will simplify the rename process later.
Then, test your insert w/ 1000 rows:
-- assuming an IDENTITY column in BaseTable SET IDENTITY_INSERT clone.BaseTable ON GO INSERT clone.BaseTable WITH (TABLOCK) (Col1, Col2, Col3) SELECT TOP 1000 Col1, Col2, Col3 = -1 FROM dbo.BaseTable GO SET IDENTITY_INSERT clone.BaseTable OFF
Examine the results. If everything appears in order:
This will take a while, but not nearly as long as an update. Once it completes, check the data in the clone table to make sure it everything is correct.
Then, recreate all non-clustered primary keys/unique constraints/indexes and foreign key constraints (in that order). Recreate default and check constraints, if applicable. Recreate all triggers. Recreate each constraint, index or trigger in a separate batch. eg:
ALTER TABLE clone.BaseTable ADD CONSTRAINT UQ_BaseTable UNIQUE (Col2) GO -- next constraint/index/trigger definition here
Finally, move dbo.BaseTable
to a backup schema and clone.BaseTable
to the dbo schema (or wherever your table is supposed to live).
-- -- perform first true-up operation here, if necessary -- EXEC clone.BaseTable_TrueUp -- GO -- -- create a backup schema, if necessary -- CREATE SCHEMA backup_20100914 -- GO BEGIN TRY BEGIN TRANSACTION ALTER SCHEMA backup_20100914 TRANSFER dbo.BaseTable -- -- perform second true-up operation here, if necessary -- EXEC clone.BaseTable_TrueUp ALTER SCHEMA dbo TRANSFER clone.BaseTable COMMIT TRANSACTION END TRY BEGIN CATCH SELECT ERROR_MESSAGE() -- add more info here if necessary ROLLBACK TRANSACTION END CATCH GO
If you need to free-up disk space, you may drop your original table at this time, though it may be prudent to keep it around a while longer.
Needless to say, this is ideally an offline operation. If you have people modifying data while you perform this operation, you will have to perform a true-up operation with the schema switch. I recommend creating a trigger on dbo.BaseTable
to log all DML to a separate table. Enable this trigger before you start the insert. Then in the same transaction that you perform the schema transfer, use the log table to perform a true-up. Test this first on a subset of the data! Deltas are easy to screw up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With