Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database row snapshots/revisions

I am looking for a suitable process by which to keep revisions, or snapshots of rows (and their relations) in a database.

Take for example an eCommerce platform-

  • A customer creates an order. The order is associated with a billing address and a delivery address.
  • Said customer then changes the address in their address book in their profile.
  • The address should not change for the original order.

I have looked at a few concepts, one being duplicate tables, another being temporal databases, and the other being keeping a revision id and active flag.

Whilst I appreciate no one can really tell me the best/most suitable solution for my application as it's a matter open to opinion etc, I was hoping that someone might be able to demonstrate the advantages/disadvantages possibly by way of comparison. I have read lots of questions on SO, and a fair few articles on the various implementations, but none really compare each idea or indicate where they'd be best suited. Below I have outlined my understanding of each of the concepts.

Duplicate tables

Store the information in rows relevant to the data with which they need to be snapshot with. I.e. Keep an address in columns in the orders table of an online store.

Advantages

  • Data is segmented into clearly relevant tables, no requirement for joins etc.
  • No need to select only active rows as required in the concepts below.
  • Assuming rows are timestamped, most of the benefits of temporal database retained

Disadvantages

  • Duplication
    • of schema (particularly problematic when multiple tables up a revision)
    • of models when using an ORM.
    • of data if a snapshot piece data hasn't changed and it's reused. I.e. if 10 orders are made, the address is stored 11 times (orders+current)
  • Extra code required to handle the inserts into relevant tables.

Temporal Databases/Active or Current row flag

Database rows which are "time-aware", i.e. their context is the time between two datetimes. Data can be joined where it's time context lies between that of the temporal table.

Advantages

  • No duplication of schema or models. Changes made in one place.
  • ORM model can handle creation of new row, marking as active etc seamlessly.
  • No replication of rows where no changes have been made. I.e. 10 orders to 1 address store the address once.

Disadvantages

  • Queries made more complex as joins/where clauses require selection of "active" row.
  • Tables become clogged up with historic data which is not selected/called upon regularly.

Storing just the changed column, temporal.

Have a table which keeps track of changes to all tables and note the row it relates to and when it is valid in terms of time.

Advantages

  • Optimised storage in terms of revisions as unchanged data not replicated.

Disadvantages

  • Queries far more complex to combine version of column with it's other data.

I have already looked at the following questions here on SO, and these other resources

Edit: The reason I haven't tagged this post with a particular DBMS as I'd like the concept to work with as many as possible ideally as the platform, at present is DBMS independent and the abstraction layer allows it to work with MySQL and MSSQL but will hopefully support others in the future.

  • Database Design for Revisions?
  • Relational Schema for Fowler's Temporal Expressions
  • Database design for text revisions
  • Storing Revisions of Relational Objects in an Efficient Way
  • Keeping page changes history. A bit like SO does for revisions
  • maintain history in a database
  • http://en.wikipedia.org/wiki/Temporal_database
  • http://www.simple-talk.com/sql/database-administration/database-design-a-point-in-time-architecture/
like image 795
Ben Swinburne Avatar asked Sep 07 '12 15:09

Ben Swinburne


People also ask

What are database snapshots?

A database snapshot is a read-only, static view of a SQL Server database (the source database). The database snapshot is transactionally consistent with the source database as of the moment of the snapshot's creation. A database snapshot always resides on the same server instance as its source database.

Why snapshot is faster than backup?

The two are very different techniques and from a performance perspective, a snapshot is created much faster than a backup/restore, but that is because it does not contain any real data when you first create it. Changes to the data pages are written to sparse file only when there is an update on the source database.


1 Answers

I ended up using a temporal database, and the implementation of this resulted in the Temporal Model in FuelPHP.

I can now configure my models to treat rows as a time sensitive entity. Changes cause a new row to be created and the end time of the original row set accordingly.

This allows me to retrieve a row at a point in time.

like image 140
Ben Swinburne Avatar answered Oct 02 '22 00:10

Ben Swinburne