Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

6NF and historical attribute data

When using a database normalized accoring to 6NF principles, how would you store historical attribute data?

Let say for example we take this example from @PerformanceDBA but with the following extra requirement:

We need to store historical data for all our products, we should be able to just enter a date and get a snapshot of the attributes of the product at that particular time.

A more practical example:
Suppose the disks and CPU's from the example above are virtual and a user can change the disk capacity at will. How can we alter the database so that we can retrieve the attributes of a given disk at any time in the past (of course after it's creation date) while keeping the 5NF view fast enough.

Things I'm considering

  • Add a timestamp column 'changedate' to each attribute table (this would result in a pretty complex query with a subquery and join for each attribute table)
  • Create a separate *history table for each attribute table (could result in a massive amount of table since we have around 70 attributes spread over 20 product types)
  • Additionally: add an indexed 'current' column to each attribute table to speed up the 5NF view

Any help is appreciated!


Edit: I know the concept of temporal databases, yet the problem is that for the database engine i'm working with (postgresql) the temporal extension isn't fully implemented yet. Any advice on how to achieve this without temporal databases?

like image 628
ChrisR Avatar asked Jan 23 '12 15:01

ChrisR


People also ask

What are attributes databases?

In relational databases, attributes are the describing characteristics or properties that define all items pertaining to a certain category applied to all cells of a column. The rows, instead, are called tuples, and represent data sets applied to a single entity to uniquely identify each item.

What is the purpose of normalization of a database?

Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

What is normalization explain first three normal forms?

first normal form (1NF): only single values are permitted at the intersection of each row and column so there are no repeating groups. normalization: the process of determining how much redundancy exists in a table. second normal form (2NF): the relation must be in 1NF and the PK comprises a single attribute.


1 Answers

The recently approved SQL:2011 standard incorporates features that allow you to deal better with this kind of problem than you could ever before.

Not that you'll be able to do everything you'd want to do in the temporal arena, but what did get introduced is indeed a fairly significant improvement.

A good presentation about it is at http://metadata-standards.org/Document-library/Documents-by-number/WG2-N1501-N1550/WG2_N1536_koa046-Temporal-features-in-SQL-standard.pdf .

Note that there's only a single vendor with reasonable support for these features in his SQL product, one other is perhaps working on it, and a third has opened the voting channel for their customers.

There's also a "Temporal Data" discussion group at www.linkedin.com dedicated to precisely your subject at hand.

EDIT trying to address "Any advice on how to achieve this without temporal databases?"

Do not add just a single date/time type column to your models. The first reason is as you gave, the second reason is that this solution is also the one promoted by the new standard, and that it will facilitate transition to engines that do support the new features once they are available.

So add BOTH a start- and an end- date/time column. DO NOT MAKE EITHER OF THEM NULLABLE. The new standard requires this for its temporal features. If the end-MIT (moment-in-time) is still unknown, use the highest value of the applicable time type, e.g. 9999-12-31.

You do not NEED to "create separate history tables for each attribute". It is equally possible to have a "single entity table" that keeps "the history of an entire entity occurrence". The downside is that it will be difficult to query for when an ACTUAL change occurred to some particular attribute (because you get new historical rows for any change to any attribute, possibly copying over the same attribute value for most of the attributes). The 'single table' is likely to be an eager consumer of space, the 'separate history for each attribute' may be an eager consumer of querying CPU time. It will be a balancing act, and where the balance is precisely, depends on your particular situation.

Do not "add an indexed 'current' column" to your tables. First, they will not help you transitioning to the new features when your engine has them, and second, Y/N columns are very bad discriminators, and therefore very poor candidates for indexing. I'd rather add your start- or end-mit to the index, they can be expected to give you the same wins for the 'current' rows, and a better win for the non-current rows, whenever you need to query those.

As for the enforcement of database constraints such as non-overlap in time periods in temporal keys and inclusion of time periods in temporal RI, well you're just entirely on your own. Write the code you need in triggers or SPROCs or application code, in decreasing order of preference.

Was this more helpful ?

like image 131
Erwin Smout Avatar answered Sep 25 '22 19:09

Erwin Smout