Excuse the long question!
We have two database tables, e.g. Car and Wheel. They are related in that a wheel belongs to a car and a car has multiple wheels. The wheels, however, can be changed without affecting the "version" of the car. The car's record can be updated (e.g. paint job) without affecting the version of the wheels (i.e. no cascade updating).
For example, Car table currently looks like this:
CarId, CarVer, VersionTime, Colour
1 1 9:00 Red
1 2 9:30 Blue
1 3 9:45 Yellow
1 4 10:00 Black
The Wheels table looks like this (this car only has two wheels!)
WheelId, WheelVer, VersionTime, CarId
1 1 9:00 1
1 2 9:40 1
1 3 10:05 1
2 1 9:00 1
So, there's been 4 versions of this two wheeled car. It's first wheel (WheelId 1) hasn't changed. The second wheel was changed (e.g. painted) at 10:05.
How do I efficiently do as of queries that can be joined to other tables as required? Note that this is a new database and we own the schema and can change it or add audit tables to make this query easier. We've tried one audit table approach (with columns: CarId, CarVersion, WheelId, WheelVersion, CarVerTime, WheelVerTime), but it didn't really improve our query.
Example query: Show the Car ID 1 as it was, including its wheel records as of 9:50. This query should result in these two rows being returned:
WheelId, WheelVer, WheelVerTime, CarId, CarVer, CarVerTime, CarColour
1 2 9:40 1 3 9:45 Yellow
2 1 9:00 1 3 9:45 Yellow
The best query we could come up with was this:
select c.CarId, c.VersionTime, w.WheelId,w.WheelVer,w.VersionTime,w.CarId
from Cars c,
( select w.WheelId,w.WheelVer,w.VersionTime,w.CarId
from Wheels w
where w.VersionTime <= "12 Jun 2009 09:50"
group by w.WheelId,w.CarId
having w.WheelVer = max(w.WheelVer)
) w
where c.CarId = w.CarId
and c.CarId = 1
and c.VersionTime <= "12 Jun 2009 09:50"
group by c.CarId, w.WheelId,w.WheelVer,w.VersionTime,w.CarId
having c.CarVer = max(c.CarVer)
And, if you wanted to try this then the create table and insert record SQL is here:
create table Wheels
(
WheelId int not null,
WheelVer int not null,
VersionTime datetime not null,
CarId int not null,
PRIMARY KEY (WheelId,WheelVer)
)
go
insert into Wheels values (1,1,'12 Jun 2009 09:00', 1)
go
insert into Wheels values (1,2,'12 Jun 2009 09:40', 1)
go
insert into Wheels values (1,3,'12 Jun 2009 10:05', 1)
go
insert into Wheels values (2,1,'12 Jun 2009 09:00', 1)
go
create table Cars
(
CarId int not null,
CarVer int not null,
VersionTime datetime not null,
colour varchar(50) not null,
PRIMARY KEY (CarId,CarVer)
)
go
insert into Cars values (1,1,'12 Jun 2009 09:00', 'Red')
go
insert into Cars values (1,2,'12 Jun 2009 09:30', 'Blue')
go
insert into Cars values (1,3,'12 Jun 2009 09:45', 'Yellow')
go
insert into Cars values (1,4,'12 Jun 2009 10:00', 'Black')
go
This kind of table is known as a valid-time state table in the literature. It is universally accepted that each row should model a period by having a start date and an end date. Basically, the unit of work in SQL is the row and a row should completely define the entity; by having just one date per row, not only do your queries become more complex, your design is compromised by splitting sub atomic parts on to different rows.
As mentioned by Erwin Smout, one of the definitive books on the subject is:
Richard T. Snodgrass (1999). Developing Time-Oriented Database Applications in SQL
It's out of print but happily is available as a free download PDF (link above).
I have actually read it and have implemented many of the concepts. Much of the text is in ISO/ANSI Standard SQL-92 and although some have been implemented in proprietary SQL syntaxes, including SQL Server (also available as downloads) I found the conceptual information much more useful.
Joe Celko also has a book, 'Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL', largely derived from Snodgrass's work, though I have to say where the two diverge I find Snodgrass's approaches preferable.
I concur this stuff is hard to implement in the SQL products we currently have. We think long and hard before making data temporal; if we can get away with merely 'historical' then we will. Much of the temporal functionality in SQL-92 is missing from SQL Server e.g. INTERVAL, OVERLAPS, etc. Some things as fundamental as sequenced 'primary keys' to ensure periods do not overlap cannot be implemented using CHECK constraints in SQL Server, necessitating triggers and/or UDFs.
Snodgrass's book is based on his work for SQL3, a proposed extension to Standard SQL to provide much better support for temporal databases, though sadly this seems to have been effectively shelved years ago :(
As-of queries are easier when each row has a start and an end time. Storing the end time in the table would be most efficient, but if this is hard, you can query it like:
select
ThisCar.CarId
, StartTime = ThisCar.VersionTime
, EndTime = NextCar.VersionTime
from Cars ThisCar
left join Cars NextCar
on NextCar.CarId = ThisCar.CarId
and ThisCar.VersionTime < NextCar.VersionTime
left join Cars BetweenCar
on BetweenCar.CarId = BetweenCar.CarId
and ThisCar.VersionTime < BetweenCar.VersionTime
and BetweenCar.VersionTime < NextCar.VersionTime
where BetweenCar.CarId is null
You can store this in a view. Say the view is called vwCars, you can select a car for a particular date like:
select *
from vwCars
where StartTime <= '2009-06-12 09:15'
and ('2009-06-12 09:15' < EndTime or EndTime is null)
You could store this in a table valued stored procedure, but that might have a steep performance penalty.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With