Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Copy one column to another for over a billion rows in SQL Server database

Database : SQL Server 2005

Problem : Copy values from one column to another column in the same table with a billion+ rows.

test_table (int id, bigint bigid)

Things tried 1: update query

update test_table set bigid = id 

fills up the transaction log and rolls back due to lack of transaction log space.

Tried 2 - a procedure on following lines

set nocount on
set rowcount = 500000
while @rowcount > 0
begin
 update test_table set bigid = id where bigid is null
 set @rowcount = @@rowcount
 set @rowupdated = @rowsupdated + @rowcount
end
print @rowsupdated

The above procedure starts slowing down as it proceeds.

Tried 3 - Creating a cursor for update.

generally discouraged in SQL Server documentation and this approach updates one row at a time which is too time consuming.

Is there an approach that can speed up the copying of values from one column to another. Basically I am looking for some 'magic' keyword or logic that will allow the update query to rip through the billion rows half a million at a time sequentially.

Any hints, pointers will be much appreciated.

like image 661
Adi Pandit Avatar asked Sep 22 '10 18:09

Adi Pandit


People also ask

How copy data from one column to another table in SQL?

Click the tab for the table with the columns you want to copy and select those columns. From the Edit menu, click Copy. Click the tab for the table into which you want to copy the columns. Select the column you want to follow the inserted columns and, from the Edit menu, click Paste.


2 Answers

I'm going to guess that you are closing in on the 2.1billion limit of an INT datatype on an artificial key for a column. Yes, that's a pain. Much easier to fix before the fact than after you've actually hit that limit and production is shut down while you are trying to fix it :)

Anyway, several of the ideas here will work. Let's talk about speed, efficiency, indexes, and log size, though.

Log Growth

The log blew up originally because it was trying to commit all 2b rows at once. The suggestions in other posts for "chunking it up" will work, but that may not totally resolve the log issue.

If the database is in SIMPLE mode, you'll be fine (the log will re-use itself after each batch). If the database is in FULL or BULK_LOGGED recovery mode, you'll have to run log backups frequently during the running of your operation so that SQL can re-use the log space. This might mean increasing the frequency of the backups during this time, or just monitoring the log usage while running.

Indexes and Speed

ALL of the where bigid is null answers will slow down as the table is populated, because there is (presumably) no index on the new BIGID field. You could, (of course) just add an index on BIGID, but I'm not convinced that is the right answer.

The key (pun intended) is my assumption that the original ID field is probably the primary key, or the clustered index, or both. In that case, lets take advantage of that fact, and do a variation of Jess' idea:

set @counter = 1
while @counter < 2000000000 --or whatever
begin
  update test_table set bigid = id 
  where id between @counter and (@counter + 499999) --BETWEEN is inclusive
  set @counter = @counter + 500000
end

This should be extremely fast, because of the existing indexes on ID.

The ISNULL check really wasn't necessary anyway, neither is my (-1) on the interval. If we duplicate some rows between calls, that's not a big deal.

like image 143
BradC Avatar answered Oct 08 '22 19:10

BradC


Use TOP in the UPDATE statement:

UPDATE TOP (@row_limit) dbo.test_table
   SET bigid = id 
 WHERE bigid IS NULL
like image 31
OMG Ponies Avatar answered Oct 08 '22 19:10

OMG Ponies