When using a database normalized accoring to 6NF principles, how would you store historical attribute data? Let say for example we take this example from @PerformanceDBA but with the following extra requirement: <blockquote> We need to store historical data for all our products, we should be able to just enter a date and get a snapshot of the attributes of the product at that particular time. A more practical example: Suppose the disks and CPU's from the example above are virtual and a user can change the disk capacity at will. How can we alter the database so that we can retrieve the attributes of a given disk at any time in the past (of course after it's creation date) while keeping the 5NF view fast enough. </blockquote> Things I'm considering <ul> <li>Add a timestamp column 'changedate' to each attribute table (this would result in a pretty complex query with a subquery and join for each attribute table)</li> <li>Create a separate *history table for each attribute table (could result in a massive amount of table since we have around 70 attributes spread over 20 product types)</li> <li>Additionally: add an indexed 'current' column to each attribute table to speed up the 5NF view </li> </ul> Any help is appreciated! <hr> Edit: I know the concept of temporal databases, yet the problem is that for the database engine i'm working with (postgresql) the temporal extension isn't fully implemented yet. Any advice on how to achieve this without temporal databases?

The recently approved SQL:2011 standard incorporates features that allow you to deal better with this kind of problem than you could ever before. Not that you'll be able to do everything you'd want to do in the temporal arena, but what did get introduced is indeed a fairly significant improvement. A good presentation about it is at http://metadata-standards.org/Document-library/Documents-by-number/WG2-N1501-N1550/WG2_N1536_koa046-Temporal-features-in-SQL-standard.pdf . Note that there's only a single vendor with reasonable support for these features in his SQL product, one other is perhaps working on it, and a third has opened the voting channel for their customers. There's also a "Temporal Data" discussion group at www.linkedin.com dedicated to precisely your subject at hand. EDIT trying to address "Any advice on how to achieve this without temporal databases?" Do not add just a single date/time type column to your models. The first reason is as you gave, the second reason is that this solution is also the one promoted by the new standard, and that it will facilitate transition to engines that do support the new features once they are available. So add BOTH a start- and an end- date/time column. DO NOT MAKE EITHER OF THEM NULLABLE. The new standard requires this for its temporal features. If the end-MIT (moment-in-time) is still unknown, use the highest value of the applicable time type, e.g. 9999-12-31. You do not NEED to "create separate history tables for each attribute". It is equally possible to have a "single entity table" that keeps "the history of an entire entity occurrence". The downside is that it will be difficult to query for when an ACTUAL change occurred to some particular attribute (because you get new historical rows for any change to any attribute, possibly copying over the same attribute value for most of the attributes). The 'single table' is likely to be an eager consumer of space, the 'separate history for each attribute' may be an eager consumer of querying CPU time. It will be a balancing act, and where the balance is precisely, depends on your particular situation. Do not "add an indexed 'current' column" to your tables. First, they will not help you transitioning to the new features when your engine has them, and second, Y/N columns are very bad discriminators, and therefore very poor candidates for indexing. I'd rather add your start- or end-mit to the index, they can be expected to give you the same wins for the 'current' rows, and a better win for the non-current rows, whenever you need to query those. As for the enforcement of database constraints such as non-overlap in time periods in temporal keys and inclusion of time periods in temporal RI, well you're just entirely on your own. Write the code you need in triggers or SPROCs or application code, in decreasing order of preference. Was this more helpful ?

6NF and historical attribute data

Tags:

database

relational-database

database-design

database-normalization

When using a database normalized accoring to 6NF principles, how would you store historical attribute data?

Let say for example we take this example from @PerformanceDBA but with the following extra requirement:

We need to store historical data for all our products, we should be able to just enter a date and get a snapshot of the attributes of the product at that particular time.

A more practical example:
Suppose the disks and CPU's from the example above are virtual and a user can change the disk capacity at will. How can we alter the database so that we can retrieve the attributes of a given disk at any time in the past (of course after it's creation date) while keeping the 5NF view fast enough.

Things I'm considering

Add a timestamp column 'changedate' to each attribute table (this would result in a pretty complex query with a subquery and join for each attribute table)
Create a separate *history table for each attribute table (could result in a massive amount of table since we have around 70 attributes spread over 20 product types)
Additionally: add an indexed 'current' column to each attribute table to speed up the 5NF view

Any help is appreciated!

Edit: I know the concept of temporal databases, yet the problem is that for the database engine i'm working with (postgresql) the temporal extension isn't fully implemented yet. Any advice on how to achieve this without temporal databases?

628

asked Jan 23 '12 15:01

ChrisR

1 Answers

The recently approved SQL:2011 standard incorporates features that allow you to deal better with this kind of problem than you could ever before.

Not that you'll be able to do everything you'd want to do in the temporal arena, but what did get introduced is indeed a fairly significant improvement.

A good presentation about it is at http://metadata-standards.org/Document-library/Documents-by-number/WG2-N1501-N1550/WG2_N1536_koa046-Temporal-features-in-SQL-standard.pdf .

Note that there's only a single vendor with reasonable support for these features in his SQL product, one other is perhaps working on it, and a third has opened the voting channel for their customers.

There's also a "Temporal Data" discussion group at www.linkedin.com dedicated to precisely your subject at hand.

EDIT trying to address "Any advice on how to achieve this without temporal databases?"

Do not add just a single date/time type column to your models. The first reason is as you gave, the second reason is that this solution is also the one promoted by the new standard, and that it will facilitate transition to engines that do support the new features once they are available.

So add BOTH a start- and an end- date/time column. DO NOT MAKE EITHER OF THEM NULLABLE. The new standard requires this for its temporal features. If the end-MIT (moment-in-time) is still unknown, use the highest value of the applicable time type, e.g. 9999-12-31.

You do not NEED to "create separate history tables for each attribute". It is equally possible to have a "single entity table" that keeps "the history of an entire entity occurrence". The downside is that it will be difficult to query for when an ACTUAL change occurred to some particular attribute (because you get new historical rows for any change to any attribute, possibly copying over the same attribute value for most of the attributes). The 'single table' is likely to be an eager consumer of space, the 'separate history for each attribute' may be an eager consumer of querying CPU time. It will be a balancing act, and where the balance is precisely, depends on your particular situation.

Do not "add an indexed 'current' column" to your tables. First, they will not help you transitioning to the new features when your engine has them, and second, Y/N columns are very bad discriminators, and therefore very poor candidates for indexing. I'd rather add your start- or end-mit to the index, they can be expected to give you the same wins for the 'current' rows, and a better win for the non-current rows, whenever you need to query those.

As for the enforcement of database constraints such as non-overlap in time periods in temporal keys and inclusion of time periods in temporal RI, well you're just entirely on your own. Write the code you need in triggers or SPROCs or application code, in decreasing order of preference.

Was this more helpful ?

131

answered Sep 25 '22 19:09

Erwin Smout

Related questions
                            
                                Currency modeling in database
                            
                                DROP SQL Tables if created date is more than a week - MS SQL Server
                            
                                What aspect of relational databases makes it difficult for them to scale sufficiently on services like Google App Engine?
                            
                                How to execute T-SQL for several databases whose names are stored in a table?
                            
                                How can I set my deployment options to script the incremental release of a Visual Studio 2010 database project?
                            
                                Enum datatype versus table of data in MySQL?
                            
                                Rails record versioning solution
                            
                                Is there a reason to use Base2 lengths for columns in SQL Server? [duplicate]
                            
                                Good resources for learning database optimization part [closed]
                            
                                Program Structure -- Simple Commandline To Do List App -- What's the Haskell way?
                            
                                Configuring Locales on Linux for PostgreSQL
                            
                                Is there a flat-file database that is supported by Silverlight? [closed]
                            
                                Does it make sense to use an OR-Mapper?
                            
                                SqlDataSource.Select()? How do I use this? (ASP.net)
                            
                                Normalization question regarding address,city,country data
                            
                                Incorrect syntax near the keyword "Primary"
                            
                                Ordering AA before A
                            
                                How to structure an extremely large table
                            
                                Database to track user notifications or activity (Facebook like)
                            
                                SQLAlchemy: Re-saving model's unique field after trying to save non-unique value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With