I need to initialize a new field with the value -1 in a 120 Million record table. <pre class="prettyprint"><code>Update table set int_field = -1; </code></pre> I let it run for 5 hours before canceling it. I tried running it with transaction level set to read uncommitted with the same results. <pre class="prettyprint"><code>Recovery Model = Simple. MS SQL Server 2005 </code></pre> Any advice on getting this done faster?

The only sane way to update a table of 120M records is with a <code>SELECT</code> statement that populates a second table. You have to take care when doing this. Instructions below. <hr> Simple Case For a table w/out a clustered index, during a time w/out concurrent DML: <ul> <li><code>SELECT *, new_col = 1 INTO clone.BaseTable FROM dbo.BaseTable</code></li> <li>recreate indexes, constraints, etc on new table</li> <li>switch old and new w/ ALTER SCHEMA ... TRANSFER.</li> <li>drop old table</li> </ul> If you can't create a clone schema, a different table name in the same schema will do. Remember to rename all your constraints and triggers (if applicable) after the switch. <hr> Non-simple Case First, recreate your <code>BaseTable</code> with the same name under a different schema, eg <code>clone.BaseTable</code>. Using a separate schema will simplify the rename process later. <ul> <li> Include the clustered index, if applicable. Remember that primary keys and unique constraints may be clustered, but not necessarily so.</li> <li> Include identity columns and computed columns, if applicable.</li> <li> Include your new INT column, wherever it belongs.</li> <li> Do not include any of the following: <ul> <li>triggers</li> <li>foreign key constraints</li> <li>non-clustered indexes/primary keys/unique constraints</li> <li>check constraints or default constraints. Defaults don't make much of difference, but we're trying to keep things minimal.</li> </ul> </li> </ul> Then, test your insert w/ 1000 rows: <pre class="prettyprint"><code>-- assuming an IDENTITY column in BaseTable SET IDENTITY_INSERT clone.BaseTable ON GO INSERT clone.BaseTable WITH (TABLOCK) (Col1, Col2, Col3) SELECT TOP 1000 Col1, Col2, Col3 = -1 FROM dbo.BaseTable GO SET IDENTITY_INSERT clone.BaseTable OFF </code></pre> Examine the results. If everything appears in order: <ul> <li>truncate the clone table</li> <li>make sure the database in in bulk-logged or simple recovery model</li> <li>perform the full insert. </li> </ul> This will take a while, but not nearly as long as an update. Once it completes, check the data in the clone table to make sure it everything is correct. Then, recreate all non-clustered primary keys/unique constraints/indexes and foreign key constraints (in that order). Recreate default and check constraints, if applicable. Recreate all triggers. Recreate each constraint, index or trigger in a separate batch. eg: <pre class="prettyprint"><code>ALTER TABLE clone.BaseTable ADD CONSTRAINT UQ_BaseTable UNIQUE (Col2) GO -- next constraint/index/trigger definition here </code></pre> Finally, move <code>dbo.BaseTable</code> to a backup schema and <code>clone.BaseTable</code> to the dbo schema (or wherever your table is supposed to live). <pre class="prettyprint"><code>-- -- perform first true-up operation here, if necessary -- EXEC clone.BaseTable_TrueUp -- GO -- -- create a backup schema, if necessary -- CREATE SCHEMA backup_20100914 -- GO BEGIN TRY BEGIN TRANSACTION ALTER SCHEMA backup_20100914 TRANSFER dbo.BaseTable -- -- perform second true-up operation here, if necessary -- EXEC clone.BaseTable_TrueUp ALTER SCHEMA dbo TRANSFER clone.BaseTable COMMIT TRANSACTION END TRY BEGIN CATCH SELECT ERROR_MESSAGE() -- add more info here if necessary ROLLBACK TRANSACTION END CATCH GO </code></pre> If you need to free-up disk space, you may drop your original table at this time, though it may be prudent to keep it around a while longer. Needless to say, this is ideally an offline operation. If you have people modifying data while you perform this operation, you will have to perform a true-up operation with the schema switch. I recommend creating a trigger on <code>dbo.BaseTable</code> to log all DML to a separate table. Enable this trigger before you start the insert. Then in the same transaction that you perform the schema transfer, use the log table to perform a true-up. Test this first on a subset of the data! Deltas are easy to screw up.

Fastest way to update 120 Million records

Tags:

sql

sql-server

sql-server-2005

I need to initialize a new field with the value -1 in a 120 Million record table.

Update table        set int_field = -1;

I let it run for 5 hours before canceling it.

I tried running it with transaction level set to read uncommitted with the same results.

Recovery Model = Simple. MS SQL Server 2005

Any advice on getting this done faster?

939

asked Sep 14 '10 17:09

Bob Probst

1 Answers

The only sane way to update a table of 120M records is with a SELECT statement that populates a second table. You have to take care when doing this. Instructions below.

Simple Case

For a table w/out a clustered index, during a time w/out concurrent DML:

SELECT *, new_col = 1 INTO clone.BaseTable FROM dbo.BaseTable
recreate indexes, constraints, etc on new table
switch old and new w/ ALTER SCHEMA ... TRANSFER.
drop old table

If you can't create a clone schema, a different table name in the same schema will do. Remember to rename all your constraints and triggers (if applicable) after the switch.

Non-simple Case

First, recreate your BaseTable with the same name under a different schema, eg clone.BaseTable. Using a separate schema will simplify the rename process later.

Include the clustered index, if applicable. Remember that primary keys and unique constraints may be clustered, but not necessarily so.
Include identity columns and computed columns, if applicable.
Include your new INT column, wherever it belongs.
Do not include any of the following:
- triggers
- foreign key constraints
- non-clustered indexes/primary keys/unique constraints
- check constraints or default constraints. Defaults don't make much of difference, but we're trying to keep things minimal.

Then, test your insert w/ 1000 rows:

-- assuming an IDENTITY column in BaseTable SET IDENTITY_INSERT clone.BaseTable ON GO INSERT clone.BaseTable WITH (TABLOCK) (Col1, Col2, Col3) SELECT TOP 1000 Col1, Col2, Col3 = -1 FROM dbo.BaseTable GO SET IDENTITY_INSERT clone.BaseTable OFF

Examine the results. If everything appears in order:

truncate the clone table
make sure the database in in bulk-logged or simple recovery model
perform the full insert.

This will take a while, but not nearly as long as an update. Once it completes, check the data in the clone table to make sure it everything is correct.

Then, recreate all non-clustered primary keys/unique constraints/indexes and foreign key constraints (in that order). Recreate default and check constraints, if applicable. Recreate all triggers. Recreate each constraint, index or trigger in a separate batch. eg:

ALTER TABLE clone.BaseTable ADD CONSTRAINT UQ_BaseTable UNIQUE (Col2) GO -- next constraint/index/trigger definition here

Finally, move dbo.BaseTable to a backup schema and clone.BaseTable to the dbo schema (or wherever your table is supposed to live).

-- -- perform first true-up operation here, if necessary -- EXEC clone.BaseTable_TrueUp -- GO -- -- create a backup schema, if necessary -- CREATE SCHEMA backup_20100914 -- GO BEGIN TRY   BEGIN TRANSACTION   ALTER SCHEMA backup_20100914 TRANSFER dbo.BaseTable   -- -- perform second true-up operation here, if necessary   -- EXEC clone.BaseTable_TrueUp   ALTER SCHEMA dbo TRANSFER clone.BaseTable   COMMIT TRANSACTION END TRY BEGIN CATCH   SELECT ERROR_MESSAGE() -- add more info here if necessary   ROLLBACK TRANSACTION END CATCH GO

If you need to free-up disk space, you may drop your original table at this time, though it may be prudent to keep it around a while longer.

Needless to say, this is ideally an offline operation. If you have people modifying data while you perform this operation, you will have to perform a true-up operation with the schema switch. I recommend creating a trigger on dbo.BaseTable to log all DML to a separate table. Enable this trigger before you start the insert. Then in the same transaction that you perform the schema transfer, use the log table to perform a true-up. Test this first on a subset of the data! Deltas are easy to screw up.

answered Oct 09 '22 01:10

Peter Radocchia

Related questions
                            
                                SQL Server User Mapping Error 15023
                            
                                What is difference between foreign key and reference key?
                            
                                MySQL where NOT IN name array?
                            
                                How to insert a record and return the newly created ID using a single SqlCommand?
                            
                                Checking multiple columns for one value
                            
                                Setting two scalar variables in one SELECT statement?
                            
                                MySQL comparison with null value
                            
                                SQL Server convert string to datetime
                            
                                mysql datatype to store month and year only [closed]
                            
                                MySQL concatenation operator
                            
                                How can I import a JSON file into PostgreSQL?
                            
                                How do I find the largest value in a column in postgres sql?
                            
                                Drop table in old version of SQLite where IF EXISTS is not supported
                            
                                Do indexes work with "IN" clause
                            
                                ANSI SQL Manual
                            
                                Why use Foreign Key constraints in MySQL?
                            
                                SQL Server: should I use information_schema tables over sys tables?
                            
                                View or Temporary Table - which to use in MS SQL Server?
                            
                                Insert row in table for each id in another table
                            
                                How to perform a left join in SQLALchemy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With