I am trying to get my head out of relational database design and into Azure Storage Tables for a Big Data pet project I am about to embark upon.
My first question is around recording version histories of my entities.
Say I have a table called Members and I want to be able to see previous versions of the members details could I set the PartitionKey as their member number and then the RowKey of the datetime stamp that it was updated?
Would this be an effective / recommended approach to this version history problem?
Thanks.
Update: Or would I be better to store the latest entry in the members table and then have a members "history" table that records the changes?
You can take both approaches. Just thinking out loud, here are some of the possible approaches:
Approach 1: Keeping member information and history in a single table (1)
The way this would work is you would keep the PartitionKey
as member's unique identifier and the RowKey
as the timestamp from when the member's information was updated. Since you would want to get the latest information about the member, I would recommend you keep the RowKey
in reverse chronological order using something like:
var rowKey = (DateTime.MaxValue.Ticks - DateTime.UtcNow.Ticks).ToString("d20");
To get the latest information about a member, you would fetch just one row for a given PartitionKey
(i.e. member ID). To get complete history about a member, you would just fetch all rows.
One problem with this approach is that you can't fetch current details about any members without knowing their member IDs.
Approach 2: Keeping member information and history in separate tables
In this approach you will maintain two tables—one for member information (let's call it Member
) and other for member history (let's call it MemberHistory
). The PartitionKey
in Member
table could be member ID and row key could be any arbitrary value (or even null
value). The PartitionKey
in the MemberHistory
table would be member ID and the RowKey
would be the timestamp when the information was updated. Again, you would want to store them in reverse chronological order.
The way this would work is that you always perform an InsertOrUpdate
(or InsertOrMerge
) operation on entities in Members
table, while you would always perform Insert
operation on entities in MembersHistory
table.
While this approach is much neater and gives you the capability of extracting current information about all members, the problem you will run into is that you can't take the advantage of the transaction feature in Azure Table Storage since you're dealing with two tables. So it may happen that your main table gets updated but your history table fails to update, thus you have to cover for that scenario as well.
Approach 3: Keeping member information and history in a single table (2)
In this approach, for the main record and history record, you would keep the PartitionKey
as member ID but keep the RowKey
as empty for the main record and RowKey
as the timestamp (again in reverse chronological order) for the history tables.
To query a member's current record, you would query something like (pseudo code):
PartitionKey == 'Member ID' && RowKey == ''
To query a member's history record, you would query something like (pseudo code):
PartitionKey == 'Member ID' && RowKey != ''
To query all members for their current record, you would query something like (pseudo code):
RowKey == ''
Please note that the query above will do a full table scan so be prepared for continuation tokens and some performance degradation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With