How to efficiently version records in an SQL database

Tags:

In at least one application, I have the need to keep old versions of records in a relational database. When something should be updated, instead a new copy would be added and the old row would be marked as not current. When something should be deleted, it should instead be marked as not current or deleted.

There is a simple use case of this: New versions of a record can only be added at the current time, superseding one row each. This can be used for archiving previous records when saving new data. For this, I'd add the following columns to each table:

VersionTime datetime -- Time when this versions becomes effective
IsCurrent bool -- Indicates whether this version is the most current (and not deleted)

This is good if you only need to know what the most current version of a record is, and only enumerate previous versions of a single record separately. Point-in-time queries are even more painful than with the second variant.

A more generic variant is this: Versions of records can be added at any time for any specified validity time range. So I could declare that some setting of an entity is valid until end of 2013, and another version of it is valid in 2014, and yet another version will be valid from 2015 on. This can be used to both, archive old data (as above), and plan ahead to use different data at some time in the future (and to keep this information as an archive). For this, I'd add the following columns to each table:

ValidFrom datetime -- Time when this version becomes valid (inclusive)
ValidTo datetime -- Time when this version becomes invalid (exclusive)

The second approach can basically represent the first as well, but it's harder to know what version is the most recent - because you can also add versions for the future. Also, the ValidFrom/ValidTo design is able to declare overlapping ranges, and by definition, the row with the highest ValidFrom shall apply in that case.

Now I'm wondering how to implement an efficient solution to manage and query such data. Normally you can just write any SQL queries with any kind of WHERE, GROUP BY and JOIN to get the records you want. But with versioning applied, you need to consider the correct version of each record. So instead of joining every version of a record from another table, an appropriate condition must be added to only select the version that is valid at a given time.

An example:

SELECT a, b, c
FROM t1

Must be changed to:

SELECT a, b, c
FROM t1
WHERE t1.ValidFrom <= :time AND t1.ValidTo > :time
ORDER BY t1.ValidFrom
LIMIT 1

More complex with a table join:

SELECT a, b, c
FROM t1
    LEFT JOIN t2 ON (t2.a = t1.a)

Must be changed to:

SELECT a, b, c
FROM t1
    LEFT JOIN t2 ON (t2.a = t1.a)
WHERE t1.ValidFrom <= :time AND t1.ValidTo > :time
    AND t2.ValidFrom <= :time AND t2.ValidTo > :time

This still doesn't handle selecting the right version of overlapping time spans. I could add some clean-up method that flattens out overlapping version time ranges, but I don't know how efficient that would be.

I'm seeking to create a class (in C# in my case) that provides methods to read and write such versioned records. The writing is relatively easy because the queries are simple and easy to control with transactions. but querying would require building an API that accepts every fragment of an SQL SELECT query and intelligently builds the SQL query to execute from that. Thie query method should only accept one additional parameter that specifies the time to fetch the data from. Depending on each entity's validity range, different versions would be selected of each.

These are basically my incomplete thoughts about versioning data and providing an API to manage it. Have you already done such a thing and would like to tell me what you think of it? Do you have another idea that worked well? Could you offer me any advice on how to implement this API? While I theoretically know how to do it, I think it's a lot of work and I can't estimate how well it will work.

673

asked Jan 27 '14 14:01

ygoe

2 Answers

If you need old data being part of your business logic then:

Save latest version in master table.(insert and update, delete will just change the status column)
Take snapshot when an update happens in detail table(before any update an snapshot will be created).

revision history

Another alternative will be Event Sourcing pattern.

If old data is just a trace log of changes then:

An Entity–attribute–value approach may come in handy. An implementation sample can be found here.

answered Oct 29 '22 15:10

Mohsen Heydari

I Know that this is an old post, But I wanted to reply not only to provide solution but also to exchange my ideas with you and also to discuss the most efficient solution for this important issue of versioning.

My idea is,

Create a table that contains 5 main versioning fields

Serial (Incremental number) is the real identifier and used for joins
ID (Self-Foreign key) is equal to the (Serial) Field value when the record is created
ValidFrom (Data from which the record became active)
ValidTo (Data to which the record became inactive) => Will be null for a current version
IsCurrent (Flag indicating that record is active)

When updating a record

Update the field to set (ValidTo) to be NOW datetime and set (IsCurrent) to false
Insert a new record by increment the (Serial) Field and keeping the very same Field (ID) of the updated record, (ValidFrom) will be NOW and (ValidTo) will be null and IsCurrent will have false.

When Deleting record

ValidTo will be set to NOW time IsCurrent set to false

by this way you will not have problems with joins as joining tables with field ID will show you all record history.

IF you have FKs to a parent table , You probably want to remove the value of the FK field.

answered Oct 29 '22 15:10

Kareem Saadeldeen

Related questions
                            
                                Gender Storage and data types [duplicate]
                            
                                How does deleting rows from a table affect its indexes?
                            
                                SQLite: efficient way to drop lots of rows
                            
                                Having a Column name as Input Parameter of a PreparedStatement
                            
                                How do I use a prepared statement in Spatialite Android?
                            
                                Postgres User Authentication on web app through LoginRoles Vs Table
                            
                                oracle sql select syntax with GROUP BY and HAVING clause
                            
                                How to improve the speed of InnoDB writes per second of MySQL DB
                            
                                Why is "TRANSFER" colored blue in SSMS?
                            
                                How can I get around differences in column types when using unpivot?
                            
                                Create To script for a temporary table [closed]
                            
                                SQL CE file sharing violation for multiple processes on local machine
                            
                                Indexes and optimization
                            
                                The target table of the UPDATE is not updatable
                            
                                Multiple $regex using $and in MongoDB
                            
                                Need a MySQL query that can filter the results
                            
                                sql insert rows from another table based on condition
                            
                                select non-duplicated records
                            
                                Open and Close Cursors Inside or Outside a Transaction and How to Close a Cursor if a Transaction Fails
                            
                                SQL query to find Primary Key of a table?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to efficiently version records in an SQL database

Tags:

sql

database

version

database-design

ygoe

People also ask

2 Answers

Mohsen Heydari

Kareem Saadeldeen

Recent Activity

Donate For Us