Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice for archiving a huge table of over 1,000,000,000 rows

I'm using SQL Server 2005. There is an audit trail table, containing over 1,000,000,000 rows. I'm planning to archive this table. When I make a simple select with nolock, I can still find blocking (probably IO blocking with other process?). So are there any best practice for this kind of situation?

like image 614
developer.cyrus Avatar asked Mar 02 '10 04:03

developer.cyrus


People also ask

Can SQL Server handle billions of rows?

They are quite good at handling record counts in the billions, as long as you index and normalize the data properly, run the database on powerful hardware (especially SSDs if you can afford them), and partition across 2 or 3 or 5 physical disks if necessary.


1 Answers

For a table that large you will be wanting to find some effective sharding/partitioning strategy. Archiving in this sense tends to be a form of partitioning but not a good one since you often want to query over the current and archive anyway. In the worst cases you end up with a SELECT over a UNION of the archive and current tables, which is worse than if you hadn't split them at all.

You will often do better by finding some other means to slice the data, say on a record type or something. But if you are going to split it by date make absolutely sure you won't query over the archive+current data set.

Also, SQL Server 2005+ doesn't by default enable MVCC. It can do this however if you enable what MS calls Snapshot Isolation. See Serializable vs. Snapshot Isolation Level.

The effect of not having this enabled is that an uncommitted INSERT or UPDATE will block a SELECT in another transaction until the first transaction commits or rolls back. That can cause unnecessary locks and limit your scalability.

like image 51
cletus Avatar answered Oct 16 '22 22:10

cletus