Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the DynamoDB partition key work?

I'm trying to understand how the partition created for DynamoDB tables.

According to this blog, "All items with the same partition key are stored together", so if I have a table with user id from 1 to 1000, does that mean I will have 1000 partition? Or it's up to the "internal hash function", but how do we know how many partitions there will be?

It later suggested using random suffix from 1-10 to evenly distribute data for each partition, but how does it know it will query 10 times for a given invoice number? Is that only when you have 10 partitions? but in this case you could have thousands of invoice numbers, that means the same amount of partitions will be created, and query made to query an invoice number

like image 694
user1883793 Avatar asked Aug 09 '17 04:08

user1883793


People also ask

Is partition key same as primary key DynamoDB?

There are two types of primary keys in DynamoDB: Partition key: This is a simple primary key. If the table has only a partition key, then no two items can have the same partition key value. Composite primary key: This is a combination of partition key and sort key.

Is partition key mandatory in DynamoDB?

Except for Scan , DynamoDB API operations require an equality operator (EQ) on the partition key for tables and GSIs. As a result, the partition key must be something that is easily queried by your application with a simple lookup. An example is using key=value , which returns either a unique item or fewer items.

How does DynamoDB determine partition?

To write an item to the table, DynamoDB calculates the hash value of the partition key to determine which partition should contain the item. In that partition, several items could have the same partition key value. So DynamoDB stores the item among the others with the same partition key, in ascending order by sort key.

What does partitioning key indicate?

The partition key portion of a table's primary key determines the logical partitions in which a table's data is stored. This in turn affects the underlying physical partitions.


3 Answers

When an Amazon DynamoDB table is created, you can specify the desired throughput in Reads per second and Writes per second. The table will then be provisioned across multiple servers (partitions) sufficient to provide the requested throughput.

You do not have visibility into the number of partitions created -- it is fully managed by DynamoDB. Additional partitions will be created as the quantity of data increases or when the provisioned throughput is increased.

Let's say you have requested 1000 Reads per second and the data has been internally partitioned across 10 servers (10 partitions). Each partition will provide 100 Reads per second. If all Read requests are for the same partition key, the throughput will be limited to 100 Reads per second. If the requests are spread over a range of different values, the throughput can be the full 1000 Reads per second.

If many queries are made for the same Partition Key, it can result in a Hot Partition that limits the total available throughput.

Think of it like a bank with lines in front of teller windows. If everybody lines up at one teller, less customers can be served. It is more efficient to distribute customers across many different teller windows. A good partition key for distributing customers might be the customer number, since it is different for each customer. A poor partition key might their zip code because they all live in the same area nearby the bank.

The simple rule is that you should choose a Partition Key that has a range of different values.

See: Partitions and Data Distribution

like image 198
John Rotenstein Avatar answered Oct 16 '22 14:10

John Rotenstein


As Per AWS DynamoDB Blog Post : Choosing the Right DynamoDB Partition Key

Choosing the Right DynamoDB Partition Key is an important step in the design and building of scalable and reliable applications on top of DynamoDB.

What is a partition key?

DynamoDB supports two types of primary keys:

Partition key: Also known as a hash key, the partition key is composed of a single attribute. Attributes in DynamoDB are similar in many ways to fields or columns in other database systems.

Partition key and sort key: Referred to as a composite primary key or hash-range key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key. Here is an example:

enter image description here

Why do I need a partition key?

DynamoDB stores data as groups of attributes, known as items. Items are similar to rows or records in other database systems. DynamoDB stores and retrieves each item based on the primary key value which must be unique. Items are distributed across 10 GB storage units, called partitions (physical storage internal to DynamoDB). Each table has one or more partitions, as shown in Figure 2. For more information, see the Understand Partition Behavior in the DynamoDB Developer Guide.

DynamoDB uses the partition key’s value as an input to an internal hash function. The output from the hash function determines the partition in which the item will be stored. Each item’s location is determined by the hash value of its partition key.

All items with the same partition key are stored together, and for composite partition keys, are ordered by the sort key value. DynamoDB will split partitions by sort key if the collection size grows bigger than 10 GB.

enter image description here

Recommendations for partition keys

Use high-cardinality attributes. These are attributes that have distinct values for each item like e-mail id, employee_no, customerid, sessionid, ordered, and so on.

Use composite attributes. Try to combine more than one attribute to form a unique key, if that meets your access pattern. For example, consider an orders table with customerid+productid+countrycode as the partition key and order_date as the sort key.

Cache the popular items when there is a high volume of read traffic. The cache acts as a low-pass filter, preventing reads of unusually popular items from swamping partitions. For example, consider a table that has deals information for products. Some deals are expected to be more popular than others during major sale events like Black Friday or Cyber Monday.

Add random numbers/digits from a predetermined range for write-heavy use cases. If you expect a large volume of writes for a partition key, use an additional prefix or suffix (a fixed number from predeternmined range, say 1-10) and add it to the partition key. For example, consider a table of invoice transactions. A single invoice can contain thousands of transactions per client.

Read More @ Choosing the Right DynamoDB Partition Key

like image 22
LuFFy Avatar answered Oct 16 '22 14:10

LuFFy


Point of confusion:

Other answers already have detailed explanation of how partitions are created by DynamoDB. So with out going into that details, let me explain the root cause of confusion while trying to understand the relationship between Partition Keys and Partitions in DynamoDB.

  • IMHO, naming the key as "Partition Key" is the cause of confusion. It should just be called Primary Key. By hearing Partition Key, our mind start relating each Partition Key to one Partition. One-to-one relationship. Which is not the case. As mentioned in the question itself, the key is an input for the "internal hash function". The output of the function is the actual reference to the partition.

  • Thus, for a table having 1000 user ids ( Partition Keys), DynamoDB need not have 1000 partitions. It may have 1/5/10 any numbers of partitions, that is decided by the throughput(capacity unit) setting you have specified.

  • Partitions may increase when you increase the throughput setting.

  • The number of partitions can also increase with increasing volume of your data, when the existing partitions can not handle it.

  • Hence, what we call Partition Key in DynamoDB is nothing but Primary Key representing unique item in the table (with the help of sort key, in case of composite key). It does not relate one-to-one to a partition (which is a storage allocation unit for table backed by SSD) directly. Actual key to a partition is obtained by passing this partition key to an internal has function.

More details here.

like image 30
Dexter Avatar answered Oct 16 '22 13:10

Dexter