Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ratio between unique hash key and range key in dynamo db

Is it a problem if I choose my hash key and range key so that the number of unique hash keys is very low (maximum: 1000), while there are many more unique range keys?

Does the ratio between the number of unique hash and range keys affect the performance of retrieval of information?

like image 438
mustafa.yavuz Avatar asked Mar 20 '23 19:03

mustafa.yavuz


2 Answers

It should not be a problem to have few hash keys with many range keys for each if:

  1. The number of hash keys is not too low
  2. Your access is randomly spread across the hash keys
  3. You don't need to scale to extreme levels

According to the AWS Developer Guidelines for Working with Tables:

Provisioned throughput is dependent on the primary key selection, and the workload patterns on individual items. When storing data, DynamoDB divides a table's items into multiple partitions, and distributes the data primarily based on the hash key element. The provisioned throughput associated with a table is also divided evenly among the partitions, with no sharing of provisioned throughput across partitions.

Essentially, each hash key resides on a single node (i.e. server). Actually, it is redundantly stored to prevent data loss, but that can be ignored for this discussion. When you provision throughput you are indirectly determining the number of nodes to spread the hash keys across. However, no matter how much throughput you provision, it is limited for a single hash key by what a single node can handle.

To explain my three caveats:

1. The number of hash keys is not too low
You mention a max of 1000 hash keys, but the concern is what the minimum is. If for example there were only 10 hash keys then you would quickly reach the throughput limit for each key and would not actually realize the provisioned throughput.

2. Your access is randomly spread across the hash keys
It doesn't matter how many hash keys you have if there are a small number of keys that are "hot". That is if you are frequently reading or writing to only a small subset of the hash keys then you will reach the throughput limit of the nodes those keys are stored on.

3. You don't need to scale to extreme levels
Even assuming you have 1000 distinct hash keys and your access is randomly spread across them, if you need to scale to extreme levels you will eventually reach a point where each hash key is on a separate node. That is, if you provision enough throughput that each hash key is allocated to a separate node (i.e. you have 1000+ nodes), then any throughput provisioned beyond that level will not be realized because you will reach the limit of each node for each key.


The ratio of range keys to hash keys should have little to no affect on get, scan and query performance.

It is my understanding that the range keys for each hash key are efficiently stored in some kind of index that will scale well. However, remember that all the rows for a given hash key are stored together on the same node, so you can reach a point where there is too much data for a given hash key. The AWS Limits in DynamoDB states:

For a table with local secondary indexes, there is a limit on item collection sizes: For every distinct hash key value, the total sizes of all table and index items cannot exceed 10 GB. Depending on your item sizes, this may constrain the number of range keys per hash value.

like image 61
Jeff Walker Code Ranger Avatar answered May 03 '23 22:05

Jeff Walker Code Ranger


As far as I know, this doesn't matter. The load distribution depends on the "frequency" of access and not on the "possible combinations". If your access is uniformly distributed across the 1000 keys you are taking about, then it is OK - This means the probability of fetching by key1 should me similar to probability of fetching key10 or key100. Internally I guess they would be bucketing your 1000 keys into say 3 groups and each of these groups "might" be served by 3 machines. You need to ensure that your access is nearly uniform so that all 3 machines get uniform load share.

like image 28
Sony Kadavan Avatar answered May 03 '23 23:05

Sony Kadavan