Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Archiving Large Table (SQL Server 2008)

I have a very large table being filled with about 100s of million records each quarter.

I manually move data from the existing table to another database using this script, to minimize the backup size, and to off load the production database when performing queries.

Is there any better way, for example, some scheduled script that will move data from the production database to some other database and then delete the records from the source database every day or week efficiently?

Note that my log file is growing rapidly due to the high number of INSERTs into this table, also when I move data to the archive database, DELETEs will be logged.

Thanks

like image 227
PyQL Avatar asked Oct 23 '12 15:10

PyQL


People also ask

How do I archive data from a table in SQL?

For the SQL Server box product - a simple strategy is to copy data to archive to another database, detach the data and log files of the database and copy them to inexpensive storage like ADLS. To access the data would require that re-attach the files to a SQL Server (which could be a SQL Server running in an Azure VM).

How do I archive a table?

Designate the original table as an archive-enabled table by issuing an ALTER TABLE statement with the ENABLE ARCHIVE clause. In that clause, specify the table that you created in the previous step as the archive table. If you want rows to be automatically archived, set the built-in global variable SYSIBMADM.


1 Answers

Let me recap the requirements:

  1. reduce the backup size
  2. reduce the number of records in the database by archiving
  3. archive the data without excessive logging

In order to reduce the backup size, you'll need to move the data into a different database.

As far as logging goes, you'll want to look over the rules of minimal logging and make sure that you are following them. Make sure that the recovery model of the database you are inserting into is in the simple or bulk-logged recovery model.

For inserting the archived data, you want to disable non-clustereds (and rebuild them after the insert has completed), utilize trace flag 610 if there is a clustered index, and put a table lock on the destination table. There are many more rules in the link that you'll want to check off, but these are the basics.

There is no minimal logging for deletes, but you can minimize log file growth by deleting in chunks with the top clause. The basic idea is (switch to simple recovery model for the duration of the delete to limit file growth):

SELECT NULL;

WHILE @@ROWCOUNT > 0

     DELETE TOP (50000) FROM TABLE WHERE Condition = TRUE;

Adjust the top number to adjust how much logging per delete is done. You'll also want to make sure the predicate condition is correct so that you only delete what you intend to. This will delete 50000, then if a rowcount is returned, it will repeat until the rowcount returned is 0.

If you really want minimal logging for everything, you can partition the source table by week, create a clone of the source table (on the same partition function and identical indexing structure), switch the partition from the source table to the cloned table, insert from the cloned table to the archive table, then truncate the cloned table. The advantage of this is a truncate rather than a delete. The disadvantage is that it's much more complicated to setup, maintain, and query (you get one heap or b-tree per partition, so if all queries don't utilize partition elimination, a clustered index/table scan would have to scan multiple b-trees/heaps instead of just one).

like image 88
brian Avatar answered Oct 03 '22 15:10

brian