Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reliability of atomic counters in DynamoDB

I was considering to use Amazon DynamoDB in my application, and I have a question regarding its atomic counters reliability.

I'm building a distributed application that needs to concurrently, and consistently, increment/decrement a counter stored in a Dynamo's attribute. I was wondering how reliable the Dynamo's atomic counter is in an heavy concurrent environment, where the concurrency level is extremely high (let's say, for example, an average rate of 20k concurrent hits - to get the idea, that would be almost 52 billions increments/decrements per month).

The counter should be super-reliable and never miss a hit. Has somebody tested DynamoDB in such critical environments?

Thanks

like image 448
Mark Avatar asked Feb 20 '12 20:02

Mark


People also ask

What is DynamoDB atomic counter?

You can use the UpdateItem operation to implement an atomic counter—a numeric attribute that is incremented, unconditionally, without interfering with other write requests. (All write requests are applied in the order in which they were received.) With an atomic counter, the updates are not idempotent.

Does DynamoDB support in place atomic updates?

Q: Does DynamoDB support in-place atomic updates? Amazon DynamoDB supports fast in-place updates. You can increment or decrement a numeric attribute in a row using a single API call. Similarly, you can atomically add or remove to sets, lists, or maps.

Is DynamoDB reliable?

DynamoDB is a reliable system that helps small, medium and large enterprises scale their applications. You can use it for mobile and web apps. It provides the option to backup, restore and secure data.

What is the durability of DynamoDB?

The 99.999999999% durability figure comes from Amazon's estimate of what S3 is designed to achieve and there is no related SLA. Note that Amazon S3 is designed for 99.99% availability but the SLA kicks in at 99.9%.


1 Answers

DynamoDB gets it's scaling properties by splitting the keys across multiple servers. This is similar to how other distributed databases like Cassandra and HBase scale. While you can increase the throughput on DynamoDB that just moves your data to multiple servers and now each server can handle total concurrent connections / number of servers. Take a look at their FAQ for an explanation on how to achieve max throughput:

Q: Will I always be able to achieve my level of provisioned throughput?

Amazon DynamoDB assumes a relatively random access pattern across all primary keys. You should set up your data model so that your requests result in a fairly even distribution of traffic across primary keys. If you have a highly uneven or skewed access pattern, you may not be able to achieve your level of provisioned throughput.

When storing data, Amazon DynamoDB divides a table into multiple partitions and distributes the data based on the hash key element of the primary key. The provisioned throughput associated with a table is also divided among the partitions; each partition's throughput is managed independently based on the quota allotted to it. There is no sharing of provisioned throughput across partitions. Consequently, a table in Amazon DynamoDB is best able to meet the provisioned throughput levels if the workload is spread fairly uniformly across the hash key values. Distributing requests across hash key values distributes the requests across partitions, which helps achieve your full provisioned throughput level.

If you have an uneven workload pattern across primary keys and are unable to achieve your provisioned throughput level, you may be able to meet your throughput needs by increasing your provisioned throughput level further, which will give more throughput to each partition. However, it is recommended that you considering modifying your request pattern or your data model in order to achieve a relatively random access pattern across primary keys.

This means that having one key that is incremented directly will not scale since that key must live on one server. There are other ways to handle this problem, for example in memory aggregation with a flush increment to DynamoDB (though this can have reliability issues) or a sharded counter where the increments are spread over multiple keys and read back by pulling all keys in the sharded counter (http://whynosql.com/scaling-distributed-counters/).

like image 195
gigq Avatar answered Sep 30 '22 08:09

gigq