In at least one application, I have the need to keep old versions of records in a relational database. When something should be updated, instead a new copy would be added and the old row would be marked as not current. When something should be deleted, it should instead be marked as not current or deleted.
There is a simple use case of this: New versions of a record can only be added at the current time, superseding one row each. This can be used for archiving previous records when saving new data. For this, I'd add the following columns to each table:
VersionTime datetime -- Time when this versions becomes effective
IsCurrent bool -- Indicates whether this version is the most current (and not deleted)
This is good if you only need to know what the most current version of a record is, and only enumerate previous versions of a single record separately. Point-in-time queries are even more painful than with the second variant.
A more generic variant is this: Versions of records can be added at any time for any specified validity time range. So I could declare that some setting of an entity is valid until end of 2013, and another version of it is valid in 2014, and yet another version will be valid from 2015 on. This can be used to both, archive old data (as above), and plan ahead to use different data at some time in the future (and to keep this information as an archive). For this, I'd add the following columns to each table:
ValidFrom datetime -- Time when this version becomes valid (inclusive)
ValidTo datetime -- Time when this version becomes invalid (exclusive)
The second approach can basically represent the first as well, but it's harder to know what version is the most recent - because you can also add versions for the future. Also, the ValidFrom/ValidTo design is able to declare overlapping ranges, and by definition, the row with the highest ValidFrom shall apply in that case.
Now I'm wondering how to implement an efficient solution to manage and query such data. Normally you can just write any SQL queries with any kind of WHERE, GROUP BY and JOIN to get the records you want. But with versioning applied, you need to consider the correct version of each record. So instead of joining every version of a record from another table, an appropriate condition must be added to only select the version that is valid at a given time.
An example:
SELECT a, b, c
FROM t1
Must be changed to:
SELECT a, b, c
FROM t1
WHERE t1.ValidFrom <= :time AND t1.ValidTo > :time
ORDER BY t1.ValidFrom
LIMIT 1
More complex with a table join:
SELECT a, b, c
FROM t1
LEFT JOIN t2 ON (t2.a = t1.a)
Must be changed to:
SELECT a, b, c
FROM t1
LEFT JOIN t2 ON (t2.a = t1.a)
WHERE t1.ValidFrom <= :time AND t1.ValidTo > :time
AND t2.ValidFrom <= :time AND t2.ValidTo > :time
This still doesn't handle selecting the right version of overlapping time spans. I could add some clean-up method that flattens out overlapping version time ranges, but I don't know how efficient that would be.
I'm seeking to create a class (in C# in my case) that provides methods to read and write such versioned records. The writing is relatively easy because the queries are simple and easy to control with transactions. but querying would require building an API that accepts every fragment of an SQL SELECT query and intelligently builds the SQL query to execute from that. Thie query method should only accept one additional parameter that specifies the time to fetch the data from. Depending on each entity's validity range, different versions would be selected of each.
These are basically my incomplete thoughts about versioning data and providing an API to manage it. Have you already done such a thing and would like to tell me what you think of it? Do you have another idea that worked well? Could you offer me any advice on how to implement this API? While I theoretically know how to do it, I think it's a lot of work and I can't estimate how well it will work.
There is no generally-accepted place to store a version number in a database schema. Databases don't have version numbers by convention. There is no standard way of putting in this or any other information about an application into the database.
SQL Source Control plugs into SQL Server Management Studio (SSMS) and links your databases to an existing version control system, such as Git, TFS or Subversion. This allows you to manage changes to database schema and static data alongside application code.
If you need old data being part of your business logic then:
If old data is just a trace log of changes then:
I Know that this is an old post, But I wanted to reply not only to provide solution but also to exchange my ideas with you and also to discuss the most efficient solution for this important issue of versioning.
My idea is,
Create a table that contains 5 main versioning fields
When updating a record
Update the field to set (ValidTo) to be NOW datetime and set (IsCurrent) to false
Insert a new record by increment the (Serial) Field and keeping the very same Field (ID) of the updated record, (ValidFrom) will be NOW and (ValidTo) will be null and IsCurrent will have false.
When Deleting record
ValidTo will be set to NOW time IsCurrent set to false
by this way you will not have problems with joins as joining tables with field ID will show you all record history.
IF you have FKs to a parent table , You probably want to remove the value of the FK field.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With