Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is using only 3 timestamps for a bitemporal SQL database possible?

When implementing a bitemporal database in SQL, it is usually recommended to use the following timestamps:

  • ValidStart
  • ValidEnd
  • TransactionStart
  • TransactionEnd

I have used this approach a few times before, but I have always wondered why having only 3 timestamps, leaving TransactionEnd out, isn't just as correct an implementation. Here a transaction time range spans from TransactionStart to the next TransactionStart.

Are there any strong arguments for not only using 3 timestamps, which will limit the size of the database?

like image 426
danielhc Avatar asked Sep 29 '22 07:09

danielhc


2 Answers

As mentioned in a comment it's for simplicity, since it's somewhat harder to make certain queries without it.

Consider the following example. John is born in some location, Location1, on January first 1990, but is first registered to be born on the fifth.

The database table, Persons, now looks like this:

+----------+--------------+------------+----------+------------+----------+
|   Name   | Location     | valid_from | valid_to | trans_from | trans_to |
+----------+--------------+------------+----------+------------+----------+
| John     | Location1    | 01-01-1990 |99-99-9999| 05/01/1990 |99-99-9999|
+----------+--------------+------------+----------+------------+----------+

At this point, removing the trans_to column wouldn't cause too much trouble, but suppose the following:

After some years, say 20, John relocates to Location2, and inform the officials 20 days later. This will make the Persons table look like this

+----------+--------------+------------+----------+------------+----------+
|   Name   | Location     | valid_from | valid_to | trans_from | trans_to |
+----------+--------------+------------+----------+------------+----------+
| John     | Location1    | 01-01-1990 |99-99-9999| 05/01/1990 |20-01-2010|
| John     | Location1    | 01-01-1990 |01-01-2010| 20/01/2010 |99-99-9999|
| John     | Location2    | 01-01-2010 |99-99-9999| 20/01/2010 |99-99-9999|
+----------+--------------+------------+----------+------------+----------+

Suppose someone wanted to find out "Where does the system think John is living now" (transaction time), regardless of where he actually lives. This can (roughly) be queried in SQL in the following way

Select  Location
From    Persons
Where   Name = John AND trans_from > NOW AND trans_to < NOW

Suppose the transaction end time was removed

+----------+--------------+------------+----------+------------+
|   Name   | Location     | valid_from | valid_to | trans_from |
+----------+--------------+------------+----------+------------+
| John     | Location1    | 01-01-1990 |99-99-9999| 05/01/1990 |
| John     | Location1    | 01-01-1990 |01-01-2010| 20/01/2010 |
| John     | Location2    | 01-01-2010 |99-99-9999| 20/01/2010 |
+----------+--------------+------------+----------+------------+

The query above is of course no longer valid, but making logic for the same query in the last table would be somewhat difficult. Since the trans_to is missing it will have to be derived from the other rows in the table. For instance the implicit trans_to time for the first row (since its the oldest entry) is the trans_from from the second row, which is the newer of the two.

The transaction end time is thus either 9999-99-99, if the row is the newest, or it's the trans_from from the row immediately succeeding it.

This means that the data concerning a specific row, is not entirely kept in that row, and the rows form a dependency on each other, which is (of course) unwanted. Furthermore it can be quite difficult to determine which exact row is the immediate successor of a row, which can make the queries even more complex

like image 105
Andersnk Avatar answered Oct 21 '22 07:10

Andersnk


An example of using only one timestamp instead of two in an 1D temporal database:

I have a shop and I want to record when an user X was in my shop.

If I use a model with start-time and end-time, this info can be recorded as

X,1,2
X,3,4

so user X was in my shop between 1 and 2 and between 3 and 4. This is clear, simple and concise.

If I model my data with only start-time as a timestamp, I will have:

X,1
X,2
X,3
X,4

but how I can interpret this data? X from (1,2) and X from (3,4)? or X from (2,3) and X from (1,4)? or X from (1,2), (2,3), (3,4)? X from (4,inf) is valid?

To understand this data I need to add additional constraints/logic/information to my data or code: maybe the intervals are non-overlaping, maybe I add an id per object, etc. All this solutions are not working in all cases, can be difficult to be maintained and other issues.

For e.g.: if I add an id(a,b in this case) to every item, it will result :

X,a,1
X,a,2
X,b,3
X,b,4

instead to store my data in 2 rows,3 columns my data will be stored in 4 rows, 3 columns. Not only I don't have any benefits using this model but this model can be reduced to:

X,a, 1,2 
X,b, 3,4

further reduced to

X, 1,2
X, 3,4
like image 1
valentin Avatar answered Oct 21 '22 08:10

valentin