Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database design for point in time "snapshot" of data?

Tags:

How to design a database that supports a feature that would allow the application user to create a snapshot of their data at a point in time, a bit like version control.

It would give the user the ability to go back and see what their data looked like in the past.

Assume that the data being "snapshotted" is complex and includes joins of multiple tables.

I'm looking for a way to give each application user the ability to snapshot their data and go back to it. Whole database snapshots is not what I'm looking for.

EDIT: Thanks for your answers. The 6NF answer is compelling as is the suggestion to de-normalise the snapshot data due to its simplicity.

Clarification: this is not a data warehousing question, nor is it a question about DB backup and restore; its about how to build a schema that allows us to capture the state of a specific set of related data at a point in time. The snapshots are generated by the application users when they see fit. Users do not snapshot the entire DB, just the data object they are interested in.

like image 595
saille Avatar asked Apr 06 '09 09:04

saille


People also ask

What is snapshot of data in database?

A database snapshot is a read-only, static view of a SQL Server database (the source database). The database snapshot is transactionally consistent with the source database as of the moment of the snapshot's creation. A database snapshot always resides on the same server instance as its source database.

What are the three types of database design?

Hierarchical database model. Relational model. Network model. Object-oriented database model.

Is a snapshot of the data in the database at a given instant in time?

Answer is "Database Instance"


2 Answers

This is NOT easy.

You're essentially asking for a Temporal Database (What Christopher Date calls Sixth Normal Form, or 6NF).

To be 6NF, a schema must also be 5NF, and, basically, for each datum, you need to attach a time range for which the datum at that value is applicable. Then in joins, the join must include only the rows that are within the time range being considered.

Temporal modeling is hard -- it's what 6th Normal Form addresses -- and not well supported in current RDBMSes.

The problem is the granularity. 6th Normal Form (as I understand it) supports temporal modeling by making every non-key (non-key:, i.e., anything "on" the entity that can change without the entity losing its identity) a separate relation. To this, you add a timestamp or time range or version number. Making everything a join solves the granularity problem, but it also means your queries are going to be more complicated and slower. It also requires figuring out all keys and non-key attributes; this tends to be a large effort.

Basically, everywhere you have a relation ("ted owns the GM stock certificate with id 789") you add a time: "ted owns the GM stock certificate with id 789 now" so that you can simultaneously say, "fred owns the GM stock certificate with id 789 from 3 Feb 2000 to yesterday". Obviously these relations are many-to-many, (ted can own more than one certificate now, and more than one over his lifetime, too, and fred can have previously owned the certificate jack owns now).

So we have a table of owners, and a table of stock certificates, and a many-to-many table that relates owners and certificates by id. To the many-to-many table, we add a start_date and an end_date.

Now, imagine that each state/province/land taxes the dividends on stock certificates, so for tax purposes to record the stock certificate's owner's state of residency.

Where the owner resides can obviously change independently with stock ownership; ted can live in Nebraska, buy 10 shares, get a dividend that Nebraska taxes, move to Nevada, sells 5 shares to fred, buy 10 more shares.

But for us, it's ted can move to Nebraska at some time, buy 10 shares at some time, get a dividend at some time, which Nebraska taxes, move to Neveda at some time, sell 5 shares to fred at some time, buy 10 more shares at some time.

We need all of that if we want to calculate what taxes ted owes in Nebraska and in Nevada, joining up on the matching/overlapping date ranges in person_stockcertificate and person_address. A person's address is no longer one-to-one, it's one-to-many because it's address during time range.

If ted buys ten shares, do we model a buy event with a single purchase date, or do we add a date_bought to each share? Depends on the question we need the model to answer.

like image 69
tpdi Avatar answered Sep 20 '22 00:09

tpdi


We did this once by creating separate database tables that contained the data we wanted to snapshot, but denormalized, i.e. every record contained all data required to make sense, not references to id's that may or may no longer exist. It also added a date to each row.

Then we produced triggers for specific inserts or updates that did a join on all affected tables, and inserted it into the snapshot tables.

This way it would be trivial to write something that restored the users' data to a point in time.

If you have a table:

user:

id, firstname, lastname, department_id 

department:

id, name, departmenthead_id 

your snapshot of the user table could look like this:

user_id, user_firstname, user_lastname, department_id, department_name, deparmenthead_id, deparmenthead_firstname, departmenthead_lastname, snapshot_date 

and a query something like

INSERT INTO usersnapshot SELECT user.id AS user_id, user.firstname AS user_firstname, user.lastname AS user_lastname, department.id AS department_id, department.name AS department_name departmenthead.id AS departmenthead_id, departmenthead.firstname AS departmenthead_firstname, departmenthead.lastname AS departmenthead_lastname, GETDATE() AS snapshot_date FROM user INNER JOIN department ON user.department_id = department.id INNER JOIN user departmenthead ON department.departmenthead_id = departmenthead.id 

This ensures each row in the snapshot is true for that moment in time, even if department or department head has changed in the meantime.

like image 24
Kamiel Wanrooij Avatar answered Sep 24 '22 00:09

Kamiel Wanrooij