I'm looking into implementing object-versioning with the added twist of needing to have both live and draft objects, and could use the insights from someone experience in this, as I'm beginning to wonder if it's even possible without potentially horrific hacks.
I'll break it down to posts with tags for the sake of the example, but my use-case is a bit more general (involving slowly changing dimensions - http://en.wikipedia.org/wiki/Slowly_changing_dimension).
Suppose you've a posts table, a tags table, and a post2tag table:
posts (
id
)
tags (
id
)
post2tag (
post_id fkey posts(id),
tag_id fkey tags(id)
)
I'm in need of a couple of things:
I've been investigating various options. So far, the best I've come up with (without points #4/#5) looks a bit like the SCD type6-hybrid setup, but instead of having a current boolean there's a materialized view for the current row. For all intents and purposes, it looks like this:
posts (
id pkey,
public,
created_at,
updated_at,
updated_by
)
post_revs (
id,
rev pkey,
public,
created_at,
created_by,
deleted_at
)
tags (
id pkey,
public,
created_at,
updated_at,
updated_by
)
tag_revs (
id,
public,
rev pkey,
created_at,
created_by,
deleted_at
)
post2tag (
post_id fkey posts(id),
tag_id fkey tags(id),
public,
created_at,
updated_at,
updated_by
)
post2tag_revs (
post_id,
tag_id,
post_rev fkey post_revs(rev), -- the rev when the relation started
tag_rev fkey tag_revs(rev), -- the rev when the relation started
public,
created_at,
created_by,
deleted_at,
pkey (post_rev, tag_rev)
)
I'm using pg_temporal to maintain indexes on period(created_at, deleted_at). And I keep the various tables in sync using triggers. Yada yada yada... I created the triggers that allow to cancel an edit to posts/tags in such a way that the draft gets stored into the revs without being published. It works great.
Except when I need to worry about draft-row related relations on post2tag. In that case, all hell breaks loose, and this hints to me that I've some kind of design problem in there. But I'm running out of ideas...
I've considered introducing data duplication (i.e. n post2tag rows introduced for each draft revision). This kind of works, but tends to be a lot slower than I'd like it to be.
I've considered introducing drafts tables for the "last draft", but this quickly tends to become very very ugly.
I've considered all sorts of flags...
So question: is there a generally accepted means of managing live vs non-live rows in a row-version controlled environment? And if not, what have you tried and been reasonably successful with?
Anchor modeling is a nice way to implement a temporal dB -- see the Wikipedia article too.
Takes some time to get used to, but work nice.
There is an online modeling tool and if you load the supplied XML file [File -> Load Model from Local File]
you should see something like this -- also use [Layout --> Togle Names]
.
The [Generate --> SQL Code]
will produce DDL for tables, views and point-in-time functions.
The code is quite long, so I am not posting it here. Check the code out -- you may need to modify it
for your DB.
Here is the file to load into modeling tool.
<schema>
<knot mnemonic="EXP" descriptor="Expired" identity="smallint" dataRange="char(1)">
<identity generator="true"/>
<layout x="713.96" y="511.22" fixed="true"/>
</knot>
<anchor mnemonic="US" descriptor="User" identity="int">
<identity generator="true"/>
<attribute mnemonic="USN" descriptor="UserName" dataRange="varchar(32)">
<layout x="923.38" y="206.54" fixed="true"/>
</attribute>
<layout x="891.00" y="242.00" fixed="true"/>
</anchor>
<anchor mnemonic="PO" descriptor="Post" identity="int">
<identity generator="true"/>
<attribute mnemonic="TIT" descriptor="Title" dataRange="varchar(2)">
<layout x="828.00" y="562.00" fixed="true"/>
</attribute>
<layout x="855.00" y="471.00" fixed="true"/>
</anchor>
<anchor mnemonic="TG" descriptor="Tag" identity="int">
<identity generator="true"/>
<attribute mnemonic="TGT" descriptor="TagText" dataRange="varchar(32)">
<layout x="551.26" y="331.69" fixed="true"/>
</attribute>
<layout x="637.29" y="263.43" fixed="true"/>
</anchor>
<anchor mnemonic="BO" descriptor="Body" identity="int">
<identity generator="true"/>
<attribute mnemonic="BOT" descriptor="BodyText" dataRange="varchar(max)">
<layout x="1161.00" y="491.00" fixed="true"/>
</attribute>
<layout x="1052.00" y="465.00" fixed="true"/>
</anchor>
<tie timeRange="datetime">
<anchorRole role="IsTagged" type="PO" identifier="true"/>
<anchorRole role="IsAttached" type="TG" identifier="true"/>
<anchorRole role="BYAuthor" type="US" identifier="false"/>
<knotRole role="Until" type="EXP" identifier="false"/>
<layout x="722.00" y="397.00" fixed="true"/>
</tie>
<tie timeRange="datetime">
<anchorRole role="Contains" type="PO" identifier="true"/>
<anchorRole role="ContainedIn" type="BO" identifier="false"/>
<layout x="975.00" y="576.00" fixed="true"/>
</tie>
<tie>
<anchorRole role="CreatedBy" type="TG" identifier="true"/>
<anchorRole role="Author" type="US" identifier="false"/>
<layout x="755.10" y="195.17" fixed="true"/>
</tie>
<tie>
<anchorRole role="CreatedBy" type="PO" identifier="true"/>
<anchorRole role="Author" type="US" identifier="false"/>
<layout x="890.69" y="369.09" fixed="true"/>
</tie>
<tie>
<anchorRole role="ModifiedBy" type="BO" identifier="true"/>
<anchorRole role="Author" type="US" identifier="false"/>
<layout x="1061.81" y="322.34" fixed="true"/>
</tie>
</schema>
I've implemented a temporal database using SCD type 2 and PostgreSQL Rules and Triggers, and wrapped it in a self-contained package for ActiveRecord: http://github.com/ifad/chronomodel
The design is independent from the language / framework, though - you can create Rules and Triggers manually and the database will take care of the rest. Have a look at https://github.com/ifad/chronomodel/blob/master/README.sql.
Also efficient indexing and querying of temporal data using geometric operators is included as a bonus. :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With