How to Store Historical Data [closed]

People also ask

How does mysql store historical data?

The first thing we need to do is clone the original table's schema to create our history table. By using a suffix of _history rather than a prefix, you keep your history tables beside the ones they track in your database list.

How do you maintain database history?

One simple way to keep version history is to create basically an identical table (eg. with _version suffix). Both of the tables would have a version field, which for the main table you increment for every update you do. The version table would have a composite primary key on (id, version).

Is it important to keep historical data?

Historical data enables the tracking ofimprovement over time which gives key insights. These insights are essential for driving a business. Marketers are always on the run to better understand and segment the customers. Keeping historical data can help marketers understand iftheir customer segment is changing.

What can you do with historical data?

The use of historical data has become a standard tool in economics, serving three main purposes: to examine the influence of the past on current economic outcomes; to use unique natural experiments to test modern economic theories; and to use modern economic theories to refine our understanding of important historical ...

Supporting historical data directly within an operational system will make your application much more complex than it would otherwise be. Generally, I would not recommend doing it unless you have a hard requirement to manipulate historical versions of a record within the system.

If you look closely, most requirements for historical data fall into one of two categories:

Audit logging: This is better off done with audit tables. It's fairly easy to write a tool that generates scripts to create audit log tables and triggers by reading metadata from the system data dictionary. This type of tool can be used to retrofit audit logging onto most systems. You can also use this subsystem for changed data capture if you want to implement a data warehouse (see below).
Historical reporting: Reporting on historical state, 'as-at' positions or analytical reporting over time. It may be possible to fulfil simple historical reporting requirements by quering audit logging tables of the sort described above. If you have more complex requirements then it may be more economical to implement a data mart for the reporting than to try and integrate history directly into the operational system.

Slowly changing dimensions are by far the simplest mechanism for tracking and querying historical state and much of the history tracking can be automated. Generic handlers aren't that hard to write. Generally, historical reporting does not have to use up-to-the-minute data, so a batched refresh mechanism is normally fine. This keeps your core and reporting system architecture relatively simple.

If your requirements fall into one of these two categories, you are probably better off not storing historical data in your operational system. Separating the historical functionality into another subsystem will probably be less effort overall and produce transactional and audit/reporting databases that work much better for their intended purpose.

I don't think there is a particular standard way of doing it but I thought I would throw in a possible method. I work in Oracle and our in-house web application framework that utilizes XML for storing application data.

We use something called a Master - Detail model that at it's simplest consists of:

Master Table for example calledWidgets often just containing an ID. Will often contain data that won't change over time / isn't historical.

Detail / History Table for example called Widget_Details containing at least:

ID - primary key. Detail/historical ID
MASTER_ID - for example in this case called 'WIDGET_ID', this is the FK to the Master record
START_DATETIME - timestamp indicating the start of that database row
END_DATETIME - timestamp indicating the end of that database row
STATUS_CONTROL - single char column indicated status of the row. 'C' indicates current, NULL or 'A' would be historical/archived. We only use this because we can't index on END_DATETIME being NULL
CREATED_BY_WUA_ID - stores the ID of the account that caused the row to be created
XMLDATA - stores the actual data

So essentially, a entity starts by having 1 row in the master and 1 row in the detail. The detail having a NULL end date and STATUS_CONTROL of 'C'. When an update occurs, the current row is updated to have END_DATETIME of the current time and status_control is set to NULL (or 'A' if preferred). A new row is created in the detail table, still linked to the same master, with status_control 'C', the id of the person making the update and the new data stored in the XMLDATA column.

This is the basis of our historical model. The Create / Update logic is handled in an Oracle PL/SQL package so you simply pass the function the current ID, your user ID and the new XML data and internally it does all the updating / inserting of rows to represent that in the historical model. The start and end times indicate when that row in the table is active for.

Storage is cheap, we don't generally DELETE data and prefer to keep an audit trail. This allows us to see what our data looked like at any given time. By indexing status_control = 'C' or using a View, cluttering isn't exactly a problem. Obviously your queries need to take into account you should always use the current (NULL end_datetime and status_control = 'C') version of a record.

I think you approach is correct. Historical table should be a copy of the main table without indexes, make sure you have update timestamp in the table as well.

If you try the other approach soon enough you will face problems:

maintenance overhead
more flags in selects
queries slowdown
growth of tables, indexes

In SQL Server 2016 and above, there is a new feature called Temporal Tables that aims to solve this challenge with minimal effort from developer. The concept of temporal table is similar to Change Data Capture (CDC), with the difference that temporal table has abstracted most of the things that you had to do manually if you were using CDC.

Related questions
                            
                                Good tool to visualise database schema? [closed]
                            
                                How to store a list in a column of a database table
                            
                                What is the most efficient way to store tags in a database?
                            
                                Database design for a survey [closed]
                            
                                MySQL: multiple tables or one table with many columns?
                            
                                How to design a database for User Defined Fields?
                            
                                How do you effectively model inheritance in a database?
                            
                                SQL ON DELETE CASCADE, Which Way Does the Deletion Occur?
                            
                                Using a Single Row configuration table in SQL Server database. Bad idea?
                            
                                Database Structure for Tree Data Structure [closed]
                            
                                How to design a product table for many kinds of product where each product has many parameters
                            
                                NoSql vs Relational database
                            
                                Storing SHA1 hash values in MySQL
                            
                                Subqueries vs joins
                            
                                What should I name a table that maps two tables together? [closed]
                            
                                When/Why to use Cascading in SQL Server?
                            
                                What's wrong with nullable columns in composite primary keys?
                            
                                Database design for audit logging [closed]
                            
                                Implementing Comments and Likes in database
                            
                                Is there ever a time where using a database 1:1 relationship makes sense?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to Store Historical Data [closed]

Tags:

versioning

database-design

People also ask

Recent Activity

Donate For Us