Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrent updates in DynamoDB, are there any guarantees?

In general, if I want to be sure what happens when several threads make concurrent updates to the same item in DynamoDB, I should use conditional updates (i.e.,"optimistic locking"). I know that. But I was wondering if there is any other case when I can be sure that concurrent updates to the same item survive.

For example, in Cassandra, making concurrent updates to different attributes of the same item is fine, and both updates will eventually be available to read. Is the same true in DynamoDB? Or is it possible that only one of these updates survive?

A very similar question is what happens if I add, concurrently, two different values to a set or list in the same item. Am I guaranteed that I'll eventually see both values when I read this set or list, or is it possible that one of the additions will mask out the other during some sort of DynamoDB "conflict resolution" protocol?

I see a version of my second question was already asked here in the past Are DynamoDB "set" values CDRTs?, but the answer refered to a not-very-clear FAQ entry which doesn't exist any more. What's I would most like to see as an answer to my question is an official DynamoDB documentation that says how DynamoDB handles concurrent updates when neither "conditional updates" nor "transactions" are involved, and in particular what happens in the above two examples. Absent such official documentation, does anyone have any real-world experience with such concurrent updates?

like image 691
Nadav Har'El Avatar asked Apr 02 '19 10:04

Nadav Har'El


1 Answers

I just had the same question and came across this thread. Given that there was no answer I decided to test it myself.

The answer, as far as I can observe is that as long as you are updating different attributes it will eventually succeed. It does take a little bit longer the more updates I push to the item so they appear to be written in sequence rather than in parallel.

I also tried updating a single List attribute in parallel and this expectedly fail, the resulting list once all queries had completed was broken and only had some of the entries pushed to it.

The test I ran was pretty rudimentary and I might be missing something but I believe the conclusion to be correct.

For completeness, here is the script I used, nodejs.

const aws = require('aws-sdk');
const ddb = new aws.DynamoDB.DocumentClient();

const key = process.argv[2];
const num = process.argv[3];


run().then(() => {
    console.log('Done');
});

async function run() {
    const p = [];
    for (let i = 0; i < num; i++) {
        p.push(ddb.update({
            TableName: 'concurrency-test',
            Key: {x: key},
            UpdateExpression: 'SET #k = :v',
            ExpressionAttributeValues: {
                ':v': `test-${i}`
            },
            ExpressionAttributeNames: {
                '#k': `k${i}`
            }
        }).promise());
    }

    await Promise.all(p);

    const response = await ddb.get({TableName: 'concurrency-test', Key: {x: key}}).promise();
    const item = response.Item;

    console.log('keys', Object.keys(item).length);
}

Run like so:

node index.js {key} {number}
node index.js myKey 10

Timings:

  • 10 updates: ~1.5s
  • 100 updates: ~2s
  • 1000 updates: ~10-20s (fluctuated a lot)

Worth noting is that the metrics show a lot of throttled events but these are handled internally by the nodejs sdk using exponential backoff so once the dust settled everything was written as expected.

like image 128
Carl Avatar answered Oct 11 '22 12:10

Carl