Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement temporal data in MySQL

I currently have a non-temporal MySQL DB and need to change it to a temporal MySQL DB. In other words, I need to be able to retain a history of changes that have been made to a record over time for reporting purposes.

My first thought for implementing this was to simply do inserts into the tables instead of updates, and when I need to select the data, simply doing a GROUP BY on some column and ordering by the timestamp DESC.

However, after thinking about things a bit, I realized that that will really mess things up because the primary key for each insert (which would really just be simulating a number of updates on a single record) will be different and thus mess up any linkage that uses the primary key to link to other records in the DB.

As such, my next thought was to continue updating the main tables in the DB, but also create a new insert into an "audit table" that is simply a copy of the full record after the update, and then when I needed to report on temporal data, I could use the audit table for querying purposes.

Can someone please give me some guidance or links on how to properly do this?
Thank you.

like image 876
HartleySan Avatar asked Jul 06 '15 18:07

HartleySan


2 Answers

Make the given table R temporal(ie, to maintain the history).

One design is to leave the table R as it is and create a new table R_Hist with valid_start_time and valid_end_time. Valid time is the time when the fact is true.

The CRUD operations can be given as:

INSERT

  • Insert into both R
  • Insert into R_Hist with valid_end_time as infinity

UPDATE

  • Update in R
  • Insert into R_Hist with valid_end_time as infinity
  • Update valid_end_time with the current time for the “latest” tuple

DELETE

  • Delete from R
  • Update valid_end_time with the current time for the “latest” tuple

SELECT

  • Select from R for ‘snapshot’ queries (implicitly ‘latest’ timestamp)
  • Select from R_Hist for temporal operations

Instead, you can choose to design new table for every attribute of table R. By this particular design you can capture attribute level temporal data as opposed to entity level in the previous design. The CRUD operations are almost similar.

like image 191
Shashank M Avatar answered Sep 20 '22 14:09

Shashank M


I did a column Deleted and a column DeletedDate. Deleted defaults to false and deleted date null.

Complex primary key on IDColumn, Deleted, and DeletedDate.

Can index by deleted so you have real fast queries.

No duplicate primary key on your IDColumn because your primary key includes deleted and deleted date.

Assumption: you won't write to the same record more than once a millisecond. Could cause duplicate primary key issue if deleted date is not unique.

So then I do a transaction type deal for updates: select row, take results, update specific values, then insert. Really its an update to deleted true deleted date to now() then you have it spit out the row after update and use that to get primary key and/or any values not available to whatever API you built.

Not as good as a temporal table and takes some discipline but it builds history into 1 table that is easy to report on.

I may start updating the deleted date column and change it to added/Deleted in addition to the added date so I can sort records by 1 column, the added/deleted column while always updated the addedBy column and just set the same value as the added/Deleted column for logging sake.

Either way could just do a complex case when not null as addedDate else addedDate as addedDate order by AddedDate desc. so, yeah, whatever, this works.

like image 43
Alex Avatar answered Sep 20 '22 14:09

Alex