Currently, I see that when I use versioning in DynamoDB, it changes the version number but the new entry will replace the old entry; ie:
old
{ object:one, name:"hey", version:1} new
{ object:one, name:"ho", version:2} What I want is to have BOTH entries in the db; ie:
old
{ object:one, name:"hey", version:1 } new
{ object:one, name:"hey", version:1} { object:one, name:"ho", version:2} Any way to accomplish this?
I've been experimenting and calculating what's most efficient in terms of read/write units and cost, considering race conditions where updates occur while a version is being logged, and avoiding of duplication of data. I've narrowed down a couple of possible solutions. You'd have to consider your best variation.
The basic concepts revolve around considering version 0 as the latest version. Also, we will use a revisions key that will list how many revisions exist before this item, but also will be used to determine the current version of the item (version = revisions + 1). Being able to calculate how many versions exists is a requirement and, in my opinion, revisions fulfills that need as well as a value that can be presented to the user.
So the first row will be created with version: 0 and revisions: 0. While this is technically the first version (v1), we don't apply a version number until it's archived. When this row changes, version stays at 0, which still denotes latest, and revisions is incremented to 1. A new row is created with all the previous values, with the exception that now that row denotes version: 1.
To summarize:
On item creation:
revisions: 0 and version 0 On item update or overwrite:
revisions version: 0 to the new version which can easily be calculated as version: revisions + 1.Here's what the conversion would look like on a table with just a primary key:
Primary Key: id
id color 9501 violet 9502 cyan 9503 magenta Primary Key: id+version
id version revisions color 9501 0 6 violet 9501 1 0 red 9501 2 1 orange 9501 3 2 yellow 9501 4 3 green 9501 5 4 blue 9501 6 5 indigo Here's converting a table that already uses a sort key:
Primary key: id+date
id date color 9501 2018-01 violet 9501 2018-02 cyan 9501 2018-03 black Primary key: id+date_ver
id date_ver revisions color 9501 2018-01__v0 6 violet 9501 2018-01__v1 0 red 9501 2018-01__v2 1 orange 9501 2018-01__v3 2 yellow 9501 2018-01__v4 3 green 9501 2018-01__v5 4 blue 9501 2018-01__v6 5 indigo Alternative #2:
id date_ver revisions color 9501 2018-01 6 violet 9501 2018-01__v1 0 red 9501 2018-01__v2 1 orange 9501 2018-01__v3 2 yellow 9501 2018-01__v4 3 green 9501 2018-01__v5 4 blue 9501 2018-01__v6 5 indigo We actually have the option of either putting previous versions in the same table or separate them into their own table. Both options have their different advantages and disadvantages.
number or appended to an existing sort key as a string Advantages:
Disadvantages:
revision key to both tablesversion. The primary table would always have version: 0. Use of this key on the primary table isn't mandatory.Advantages:
get requests do not change.Disadvantages:
Regardless of how you decide to partition the data, now we have to decide how create the revision rows. Here are a couple of different methods:
Summary: Get the current version of the row. Perform both an update on the current row and insert the previous version with one transaction.
To avoid race conditions, we need to write both the update and insert in the same operation using TransactWriteItems. Also, we need to make sure that the version we are updating is the right version by the time the request reaches database server. We achieve this by either one of two checks, or even both:
Update command in TransactItems, the ConditionExpression must check that the revision in the row to be updated matches the revision in the object we performed a Get on before.Put command in TransactItems, the ConditionExpression checks to make sure the row does not already exist.Cost
Notes:
Summary: Get and store current row. When overwriting or updating a row, check against current revision and increment revisions. Insert previously stored row with version number.
Perform an update with
{ UpdateExpression: 'SET revisions = :newRevisionCount', ExpressionAttributeValues: { ':newRevisionCount': previousRow.revisions + 1, ':expectedRevisionCount': previousRow.revisions, }, ConditionExpression: 'revisions = :expectedRevisionCount', } We can use the same ConditionExpression with put when overwriting a previously existing row.
In the response, we are watching for ConditionalCheckFailedException. If this is returned, that means the revision was already changed by another process and we need to repeat the process from the beginning or abort entirely. If there are no exceptions, then we can insert the previous stored row after update the value on your version attribute as appropriate (numeric or string).
Cost
Summary: Perform a "blind" update on the v0 row while incrementing revisions and requesting the old attributes. Use the return value to create a new row with the version number.
Perform an update-item with
{ UpdateExpression: 'ADD revisions :revisionIncrement', ExpressionAttributeValues: { ':revisionIncrement': 1, }, ReturnValues: 'ALL_OLD', } The ADD action will automatically create revisions if it doesn't exist and will consider it 0. One nice benefit of ReturnValues is:
There is no additional cost associated with requesting a return value aside from the small network and processing overhead of receiving a larger response. No read capacity units are consumed.
In the update response, the Attributes value will be the data from the old record. The version of this record is the value of Attributes.revisions + 1. Update the value on your version attribute as appropriate (numeric or string).
Now you can insert this record into your target table.
Cost
Notes:
Attributes length is limited to 65535.Summary: Perform "blind" updates or inserts while incrementing revisions. Use a Lambda trigger watching for changes to revision to insert revisions asynchronously.
Perform an update with
{ UpdateExpression: 'ADD revisions :revisionIncrement', ExpressionAttributeValues: { ':revisionIncrement': 1, }, } The ADD action will automatically create revisions if it doesn't exist and will consider it 0.
For overwriting records with put increment revisions value based on a previous get request.
Configure a DynamoDB Stream view type to return both new and old images. Setup a Lambda trigger against the database table. Here's a piece of sample code for NodeJS that would compare the old and new images and call a function to write the revisions in batch.
/** * @param {AWSLambda.DynamoDBStreamEvent} event * @return {void} */ export function handler(event) { const oldRevisions = event.Records .filter(record => record.dynamodb.OldImage && record.dynamodb.NewImage && record.dynamodb.OldImage.revision.N !== record.dynamodb.NewImage.revision.N) .map(record => record.dynamodb.OldImage); batchWriteRevisions(oldRevisions); } This is just sample, but production code would likely include more checks.
Cost
Notes:
For my use cases, I'm already using a DynamoDB Streams and I don't expect users to request versioned rows all that often. I also can let users wait a bit for the revisions to be ready since it's asynchronous. That makes using a second table and the automated lambda process the more ideal solution for me.
For the asynchronous options there are some points of failure. Still, it's something that you can either retry immediately on the on-demand requests, or schedule for later for the DynamoDB Stream solution.
If anybody has any other solutions or critiques, please comment. Thanks!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With