Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Database Best Practices - Use of Archive tables?

I'm not a trained DBA, but perform some SQL tasks and have this question:

In SQL databases I've noticed the use archive tables that mimic another table with the exact same fields and which are used to accept rows from the original table when that data is deemed for archiving. Since I've seen examples where those tables reside in the same database and on the same drive, my assumption is that this was done to increase performance. Such tables didn't have more than a about 10 million rows in them...

  • Why would this be done instead of using a column to designate the status of the row, such as a boolean for an in/active flag?
  • At what point would this improve performance ?
  • What would be the best pattern to structure this correctly, given that the data may still need to be queried (or unioned with current data) ?
  • What else is there to say about this ?
like image 745
matrix4use Avatar asked Jan 16 '14 23:01

matrix4use


2 Answers

The notion of archiving is a physical, not logical, one. Logically the archive table contains the exact same entity and ought to be the same table.

Physical concerns tend to be pragmatic. The overarching notion is that the "database is getting too (big/slow"). Archiving records makes it easier to do things like:

  1. Optimize the index structure differently. Archive tables can have more indexes without affecting insert/update performance on the working table. In addition, the indexes can be rebuilt with full pages, while the working table will generally want to have pages that are 50% full and balanced.

  2. Optimize storage media differently. You can put the archive table on slower/less expensive disk drives that maybe have more capacity.

  3. Optimize backup strategies differently. Working tables may require hot backups or log shipping while archive tables can use snapshots.

  4. Optimize replication differently, if you are using it. If an archive table is only updated once per day via nightly batch, you can use snapshot as opposed to transactional replication.

  5. Different levels of access. Perhaps you want different security access levels for the archive table.

  6. Lock contention. If you working table is very hot you'd rather have your MIS developers access the archive table where they are less likely to halt your operations when they run something and forget to specify dirty read semantics.

The best practice would not to use archive tables but to move the data from the OLTP database to an MIS database, data warehouse, or data marts with denormalized data. But some organizations will have trouble justifying the cost of an additional DB system (which aren't cheap). There are far fewer hurdles to adding an additional table to an existing DB.

like image 69
John Wu Avatar answered Oct 11 '22 08:10

John Wu


I say this frequently, but...

Multiple tables of identical structure almost never makes sense.

A status flag is a much better idea. There are proper ways to increase performance (partitioning/indexing) without denormalizing data or otherwise creating redundancies. 10 million records is pretty small in the world of modern rdbms, so what you're seeing is the product of poor planning or misunderstanding of databases.

like image 35
Hart CO Avatar answered Oct 11 '22 09:10

Hart CO