Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Same partition key's data distribution in DynamoDB

Tags:

From what I'm understanding, DynamoDB tries to put items with the same partition key into the same partition. My question is how does the hashing work when the partition is full and gets split into 2 different partitions?

For example, a table has a partition key A, and DynamoDB put all items with partition key A into the same partition P, then P is full, dynamo will split P into P1 and P2, now a new item I with a partition key A is inserted by the client, How does dynamo decide which partition (i.e. P1 and P2) to insert I?

like image 871
Dr.Pro Avatar asked Jul 08 '17 05:07

Dr.Pro


2 Answers

All items with the same partition key are stored together, and for composite partition keys, are ordered by the sort key value. DynamoDB splits partitions by sort key if the collection size grows bigger than 10 GB.

Source: https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/

So in that case they will use the sort key to decide in which partition the elements are stored.

However, I don't know how they deal with the case in which there is no sort key.

I also guess that the time to access a partition will not be constant anymore, because it will be needed to find the proper partition in logarithmic time respect to the number of partitions for this partition key. Since if you use the sort key with the partition key to compute a 'merged' hash key, you won't be able to keep the elements sorted (elements with contiguous sort keys would be in different partitions).

like image 168
Gonzalo Solera Avatar answered Sep 30 '22 13:09

Gonzalo Solera


Partition Key is used mainly to specify where to store physically the data, this is done using consistent hashing function to distribute your data into different partitions or physical storage. To read the value using partition key, it is going to hash it using the same hashing function to get the correct partition then fetch the data from that partition. While sort key is used to index that data inside each partition.

Partition keys should be designed in away that maintains evenly distribution of your workload instead of having some partitions fully overflown or loaded while having others idle.

You can read more about this here:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html

https://cloudacademy.com/blog/dynamodb-replication-and-partitioning-part-4/

like image 22
Muhammad Soliman Avatar answered Sep 30 '22 14:09

Muhammad Soliman