Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I implement versioning without replacing with previous record in DynamoDB?

Currently, I see that when I use versioning in DynamoDB, it changes the version number but the new entry will replace the old entry; ie:

old

{ object:one, name:"hey", version:1} 

new

{ object:one, name:"ho", version:2} 

What I want is to have BOTH entries in the db; ie:

old

{ object:one, name:"hey", version:1 } 

new

{ object:one, name:"hey", version:1} { object:one, name:"ho", version:2} 

Any way to accomplish this?

like image 787
iCodeLikeImDrunk Avatar asked Jun 17 '14 22:06

iCodeLikeImDrunk


1 Answers

I've been experimenting and calculating what's most efficient in terms of read/write units and cost, considering race conditions where updates occur while a version is being logged, and avoiding of duplication of data. I've narrowed down a couple of possible solutions. You'd have to consider your best variation.

The basic concepts revolve around considering version 0 as the latest version. Also, we will use a revisions key that will list how many revisions exist before this item, but also will be used to determine the current version of the item (version = revisions + 1). Being able to calculate how many versions exists is a requirement and, in my opinion, revisions fulfills that need as well as a value that can be presented to the user.

So the first row will be created with version: 0 and revisions: 0. While this is technically the first version (v1), we don't apply a version number until it's archived. When this row changes, version stays at 0, which still denotes latest, and revisions is incremented to 1. A new row is created with all the previous values, with the exception that now that row denotes version: 1.

To summarize:

On item creation:

  • Create item with revisions: 0 and version 0

On item update or overwrite:

  • Increment revisions
  • Insert old row exactly as before, but change version: 0 to the new version which can easily be calculated as version: revisions + 1.

Here's what the conversion would look like on a table with just a primary key:

Primary Key: id

  id  color 9501  violet 9502  cyan 9503  magenta 

Primary Key: id+version

id    version  revisions  color 9501        0          6  violet 9501        1          0  red 9501        2          1  orange 9501        3          2  yellow 9501        4          3  green 9501        5          4  blue 9501        6          5  indigo 

Here's converting a table that already uses a sort key:

Primary key: id+date

id    date     color 9501  2018-01  violet 9501  2018-02  cyan 9501  2018-03  black 

Primary key: id+date_ver

id    date_ver     revisions  color 9501  2018-01__v0          6  violet 9501  2018-01__v1          0  red 9501  2018-01__v2          1  orange 9501  2018-01__v3          2  yellow 9501  2018-01__v4          3  green 9501  2018-01__v5          4  blue 9501  2018-01__v6          5  indigo 

Alternative #2:

id    date_ver     revisions  color 9501  2018-01              6  violet 9501  2018-01__v1          0  red 9501  2018-01__v2          1  orange 9501  2018-01__v3          2  yellow 9501  2018-01__v4          3  green 9501  2018-01__v5          4  blue 9501  2018-01__v6          5  indigo 

We actually have the option of either putting previous versions in the same table or separate them into their own table. Both options have their different advantages and disadvantages.

Using the same table:

  • Primary key is composed of Partition key and Sort key
  • Version must be used in sort key either alone as a number or appended to an existing sort key as a string

Advantages:

  • All data exists in one table

Disadvantages:

  • Possibly limits your use of the table sort keys
  • Versioning uses the same writes units as your primary table
  • Sort keys can only be configured during table creation
  • Possibly need to readjust your code to query against v0
  • Previous versions will also be affected by indexes

Using a secondary tables:

  • Add revision key to both tables
  • If not using a sort key, build a sort key for secondary table called version. The primary table would always have version: 0. Use of this key on the primary table isn't mandatory.
  • If already using a sort key, see "Alternative #2" above

Advantages:

  • Primary table does not need to change any keys or be recreated. get requests do not change.
  • Primary table keeps its sort key
  • Secondary table can have independent read and write capacity units
  • Secondary table has its own indexes

Disadvantages:

  • Requires managing of a second table

Regardless of how you decide to partition the data, now we have to decide how create the revision rows. Here are a couple of different methods:

On-demand, synchronous item-overwrite/update and revision-insert

Summary: Get the current version of the row. Perform both an update on the current row and insert the previous version with one transaction.

To avoid race conditions, we need to write both the update and insert in the same operation using TransactWriteItems. Also, we need to make sure that the version we are updating is the right version by the time the request reaches database server. We achieve this by either one of two checks, or even both:

  1. In the Update command in TransactItems, the ConditionExpression must check that the revision in the row to be updated matches the revision in the object we performed a Get on before.
  2. In the Put command in TransactItems, the ConditionExpression checks to make sure the row does not already exist.

Cost

  • 1 Read capacity unit per 4K for Get on v0
  • 1 Write capacity unit for preparing TransactWriteItem
  • 1 Write capacity unit per 1K for Put/Update on v0
  • 1 Write capacity unit per 1K for Put on revision
  • 1 Write capacity unit for committing TransactWriteItem

Notes:

  • Items are limited to 400KB

On-demand, asynchronous item-get, item-overwrite/update, and revision-insert

Summary: Get and store current row. When overwriting or updating a row, check against current revision and increment revisions. Insert previously stored row with version number.

Perform an update with

{   UpdateExpression: 'SET revisions = :newRevisionCount',   ExpressionAttributeValues: {     ':newRevisionCount': previousRow.revisions + 1,     ':expectedRevisionCount': previousRow.revisions,   },   ConditionExpression: 'revisions = :expectedRevisionCount', } 

We can use the same ConditionExpression with put when overwriting a previously existing row.

In the response, we are watching for ConditionalCheckFailedException. If this is returned, that means the revision was already changed by another process and we need to repeat the process from the beginning or abort entirely. If there are no exceptions, then we can insert the previous stored row after update the value on your version attribute as appropriate (numeric or string).

Cost

  • 1 read capacity unit per 4K for Get on v0
  • 1 write capacity unit per 1KB for Put/UpdateItem on v0
  • 1 write capacity unit per 1KB for Put on revision

On-demand, asynchronous blind item-update and revision-insert

Summary: Perform a "blind" update on the v0 row while incrementing revisions and requesting the old attributes. Use the return value to create a new row with the version number.

Perform an update-item with

{   UpdateExpression: 'ADD revisions :revisionIncrement',   ExpressionAttributeValues: {     ':revisionIncrement': 1,   },   ReturnValues: 'ALL_OLD', } 

The ADD action will automatically create revisions if it doesn't exist and will consider it 0. One nice benefit of ReturnValues is:

There is no additional cost associated with requesting a return value aside from the small network and processing overhead of receiving a larger response. No read capacity units are consumed.

In the update response, the Attributes value will be the data from the old record. The version of this record is the value of Attributes.revisions + 1. Update the value on your version attribute as appropriate (numeric or string).

Now you can insert this record into your target table.

Cost

  • 1 write capacity unit per 1KB for Update on v0
  • 1 write capacity unit per 1KB for Put on revision

Notes:

  • The returned object's Attributes length is limited to 65535.
  • No solution for overwriting rows.

Automated asynchronous revision-insert

Summary: Perform "blind" updates or inserts while incrementing revisions. Use a Lambda trigger watching for changes to revision to insert revisions asynchronously.

Perform an update with

{   UpdateExpression: 'ADD revisions :revisionIncrement',   ExpressionAttributeValues: {     ':revisionIncrement': 1,   }, } 

The ADD action will automatically create revisions if it doesn't exist and will consider it 0.

For overwriting records with put increment revisions value based on a previous get request.

Configure a DynamoDB Stream view type to return both new and old images. Setup a Lambda trigger against the database table. Here's a piece of sample code for NodeJS that would compare the old and new images and call a function to write the revisions in batch.

/**  * @param {AWSLambda.DynamoDBStreamEvent} event  * @return {void}  */ export function handler(event) {   const oldRevisions = event.Records     .filter(record => record.dynamodb.OldImage       && record.dynamodb.NewImage       && record.dynamodb.OldImage.revision.N !== record.dynamodb.NewImage.revision.N)     .map(record => record.dynamodb.OldImage);   batchWriteRevisions(oldRevisions); } 

This is just sample, but production code would likely include more checks.

Cost

  • 1 read capacity unit per 4K for get on v0 (only when overwriting)
  • 1 write capacity unit per 1KB for Put/Update on v0
  • 1 DynamoDB Stream read request unit per GetRecords command
  • 1 write capacity unit per 1KB for Put of revision

Notes:

  • DynamoDB Stream shard data expires after 24 hours
  • DynamoDB Stream read request units are independent of table read capacity units
  • Use of Lambda functions has its own pricing
  • Changing stream view type requires disabling and re-enabling the stream
  • Works with Write, Put, BatchWriteItems, TransactWriteItems commands

For my use cases, I'm already using a DynamoDB Streams and I don't expect users to request versioned rows all that often. I also can let users wait a bit for the revisions to be ready since it's asynchronous. That makes using a second table and the automated lambda process the more ideal solution for me.

For the asynchronous options there are some points of failure. Still, it's something that you can either retry immediately on the on-demand requests, or schedule for later for the DynamoDB Stream solution.

If anybody has any other solutions or critiques, please comment. Thanks!

like image 105
ShortFuse Avatar answered Oct 07 '22 11:10

ShortFuse