Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding hot keys in Dynamo DB

I have a table with the default capacities i.e. 5 RCUs and 5 WCUs. According to the docs, this would result in Dynamo DB creating only one partition.

Table Structure:

  • Partition Key: item_type
  • Sort Key: item_id

I have some item_types with one or two item ids and some with 100,000. We have around 10 million records in total.

Am trying to understand if there is only one partition, how will this create the problem of hot keys? What is a hot key in general?

like image 667
Azhar Avatar asked Apr 16 '18 12:04

Azhar


People also ask

What is the primary key in a DynamoDB table?

The primary key that uniquely identifies each item in an Amazon DynamoDB table can be simple (a partition key only) or composite (a partition key combined with a sort key).

What is the Hot partition problem in DynamoDB?

Choosing the right keys is essential to keep your DynamoDB tables fast and performant. If your application will not access the keyspace uniformly, you might encounter the hot partition problem also known as hot key. What is a hot key?

What are the basic rules for querying in DynamoDB?

Remember the basic rules for querying in DynamoDB: The query includes a key condition and filter expression. The key condition selects the partition key and, optionally, a sort key. The partition key query can only be equals to (=).

What is a composite key in DynamoDB?

With simple key, DynamoDB essentially works just like a Key-Value store. You lose the ability to perform a Query, to fetch data from a table with such key, you can use Scan or GetItem operation. Another option is to use a composite key, which is composed of partition key, also known as hash key, and sort key, also known as range key.


1 Answers

I know it's an old answer, but I found some useful informations.

As described in Partitions and Data distributions:

DynamoDB allocates additional partitions to a table in the following situations:
- If you increase the table's provisioned throughput settings beyond what the existing partitions can support.
- If an existing partition fills to capacity and more storage space is required.

This means that you can't assume how many partitions you are using. Actually, DynamoDB's docs are never talking about physical partitions. They instead focus on the partitionKey of a table.

If you dig more in the page there are detailed explanation about how dynamoDB uses the partionKey to hash the logical/physical partition.

How to use a partitionKey to avoid hot key?

As described in Designing Partition Keys to Distribute Your Workload Evenly:

The partition key portion of a table's primary key determines the logical partitions in which a table's data is stored. This in turn affects the underlying physical partitions. Provisioned I/O capacity for the table is divided evenly among these physical partitions. Therefore a partition key design that doesn't distribute I/O requests evenly can create "hot" partitions that result in throttling and use your provisioned I/O capacity inefficiently.

That oversimplified means that typically you have to design your partitionKey in order to maximize the partition/record factor.

This isn't always true: for example you can have a large number of record under the same partitionKey, that are almost never read or updated and writes to that partitionKey are rare.

In your case: if you expect to have a lot of reads/writes to the same item_type it's better to model your data differently.


More useful links:

Best Practices for Designing and Using Partition Keys Effectively
Using Write Sharding to Distribute Workloads Evenly

like image 171
Christian Paesante Avatar answered Sep 27 '22 18:09

Christian Paesante