Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DynamoDb batchGetItem and Partition Key and Sort Key

I tried to use batchGetItem to return attributes of more then one item from a table but seems it works only with the combination of the partition key and range key, but what if I want to identify the requested items only by primary key ? is the only way is to create the table without the range key ?

    // Adding items
    $client->putItem(array(
        'TableName' => $table,
        'Item' => array(
            'id'     => array('S' => '2a49ab04b1534574e578a08b8f9d7441'),
            'name'   => array('S' => 'test1'),
            'user_name'   => array('S' => 'aaa.bbb')
        )
    ));

    // Adding items
    $client->putItem(array(
        'TableName' => $table,
        'Item' => array(
            'id'     => array('S' => '4fd70b72cc21fab4f745a6073326234d'),
            'name'   => array('S' => 'test2'),
            'user_name'   => array('S' => 'aaaa.bbbb'),
            'user_name1'   => array('S' => 'aaaaa.bbbbb')
        )
    ));

$client->batchGetItem(array(
    "RequestItems" => array(
        $table => array(
            "Keys" => array(
                // hash key
                array(
                    "id"  => array( 'S' => "2a49ab04b1534574e578a08b8f9d7441"),
                // range key
                    "name" => array( 'S' => "test1"),
                ),
                array(
                // hash key
                    "id"  => array( 'S' => "4fd70b72cc21fab4f745a6073326234d"),
                // range key
                    "name" => array( 'S' => "test2"),
                ),
            )
        )
    )
));

As per the official documentation:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html

If the table has a composite primary key (partition key and sort key), DynamoDB calculates the hash value of the partition key in the same way as described in Data Distribution: Partition Key—but it stores all of the items with the same partition key value physically close together, ordered by sort key value.

What are the advantages using Partition Key and Sort Key beside it stores all of the items with the same partition key value physically close together ?

As per the official documentation:

A single operation can retrieve up to 16 MB of data, which can contain as many as 100 items. BatchGetItem will return a partial result if the response size limit is exceeded, the table's provisioned throughput is exceeded, or an internal processing failure occurs.

How to handle the request if I need more then 100 items ? just loop through all the items from the code and request each time 100 times or there is another way to achieve it via the AWS SDK DynamoDB?

Example of table creation:

$client->createTable(array(
        'TableName' => $table,
        'AttributeDefinitions' => array(
            array(
                'AttributeName' => 'id',
                'AttributeType' => 'N'      
            ),
            array(
                'AttributeName' => 'name',
                'AttributeType' => 'S'
            )
        ),
        'KeySchema' => array(
            array(
                'AttributeName' => 'id',
                'KeyType'       => 'HASH'
            ),
            array(
                'AttributeName' => 'name',
                'KeyType'       => 'RANGE'
            )
        ),
        'ProvisionedThroughput' => array(
            'ReadCapacityUnits'  => 5,
            'WriteCapacityUnits' => 5
        )
    ));

Thanks

UPDATE - Question to Mark B answer:

Yes you can create an index without a range key. The range key is entirely optional. However, even if you have a range key defined it is optional to include it in your query. You can simply specify the hash key in your query to get all items with the hash key, which will be returned in an order based on the range key.

If I specify only the hash key in my query on a table with hash key and range key, I getting the below error, if I specify only the hash key in my query on a table without range key it works. Please note the table without index.

An uncaught Exception was encountered

Type:        Aws\DynamoDb\Exception\DynamoDbException
Message:     Error executing "BatchGetItem" on "https://dynamodb.eu-central-1.amazonaws.com"; AWS HTTP error: Client error: `POST https://dynamodb.eu-central-1.amazonaws.com` resulted in a `400 Bad Request` response:
{"__type":"com.amazon.coral.validate#ValidationException","message":"The provided key element does not match the schema" (truncated...)
 ValidationException (client): The provided key element does not match the schema - {"__type":"com.amazon.coral.validate#ValidationException","message":"The provided key element does not match the schema"}
Filename:    /var/app/vendor/aws/aws-sdk-php/src/WrappedHttpHandler.php
like image 847
Berlin Avatar asked Sep 12 '17 11:09

Berlin


People also ask

What is sort key and partition key in DynamoDB?

Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key. DynamoDB uses the partition key value as input to an internal hash function.

What is partition key distribution key and sort key?

A Partition Key is simply the key that DynamoDB uses to partition your data onto separate logical data shards. Adding a Sort Key allows us to store multiple records with the same partition key value since the partition key + sort key forms a unique pair, and is therefore our primary key.

Is sort key primary key in DynamoDB?

In an Amazon DynamoDB table, the primary key that uniquely identifies each item in the table can be composed not only of a partition key, but also of a sort key. Well-designed sort keys have two key benefits: They gather related information together in one place where it can be queried efficiently.

Can DynamoDB have 2 sort keys?

4. How many sort keys can DynamoDB have? There should only be one sort key defined per table. But, it can be composed using multiple columns.


2 Answers

A lot of questions you've asked, so I'll try and break it down. (Sorry I can't answer the question with php code snippets)

I tried to use batchGetItem to return attributes of more then one item from a table but seems it works only with the combination of the partition key and range key, but what if I want to identify the requested items only by primary key ? is the only way is to create the table without the range key ?

The BatchGetItem is the same as multiple GetItem calls. Essentially, retrieve Zero or One items with each GetItem call. You give it the unique key for the item you wish to retrieve (primary key). If your table has only Partition Key, then thats all you specify, otherwise Partition and Range key. BatchGetItem batches GetItem calls up in one request to DynamoDB.

If you wish to query for multiple items for a given Partition Key, you want to look at the Query API.

What are the advantages using Partition Key and Sort Key beside it stores all of the items with the same partition key value physically close together ?

This is a difficult question to answer, as it heavily depends on the unique key of your data model.

Some advantages that come to mind are: 1. Sort Keys enable you to sort the data on that attribute (in Ascending or Descending order) 2. Sort keys have more comparison operations (ie: Greater than, Less Than, Between, Begins with, etc). See docs

How to handle the request if I need more then 100 items ? just loop through all the items from the code and request each time 100 times or there is another way to achieve it via the AWS SDK DynamoDB?

If you request more than 100 items, BatchGetItem will return a ValidationException with the message "Too many items requested for the BatchGetItem call". You will need to loop through the items, 100 at a time to get all the items you need. Keep in mind, there is also a size limit of 16MB, which means if any items are unprocessed, they will be returned in the response under "UnprocessedItems".

If DynamoDB returns any unprocessed items, you should retry the batch operation on those items. However, we strongly recommend that you use an exponential backoff algorithm. If you retry the batch operation immediately, the underlying read or write requests can still fail due to throttling on the individual tables. If you delay the batch operation using exponential backoff, the individual requests in the batch are much more likely to succeed.

This documentation explains how to use it.

like image 89
Abhaya Chauhan Avatar answered Oct 03 '22 09:10

Abhaya Chauhan


but what if I want to identify the requested items only by primary key ? is the only way is to create the table without the range key ?

Yes you can create an index without a range key. The range key is entirely optional. However, even if you have a range key defined it is optional to include it in your query. You can simply specify the hash key in your query to get all items with the hash key, which will be returned in an order based on the range key.

What are the advantages using Partition Key and Sort Key beside it stores all of the items with the same partition key value physically close together ?

The two fields combined are your primary key, which guarantees uniqueness. The range/sort key also determines the order that results are returned in.

How to handle the request if I need more then 100 items ?

From the documentation (emphasis mine):

The maximum number of item attributes that can be retrieved for a single operation is 100. Also, the number of items retrieved is constrained by a 1 MB the size limit. If the response size limit is exceeded or a partial result is returned due to an internal processing failure, Amazon DynamoDB returns an UnprocessedKeys value so you can retry the operation starting with the next item to get.

For example, even if you ask to retrieve 100 items, but each individual item is 50k in size, the system returns 20 items and an appropriate UnprocessedKeys value so you can get the next page of results. If necessary, your application needs its own logic to assemble the pages of results into one set.

So you would need to check the UnprocessedKeys value of the result and continue making requests in your application until there are no more UnprocessedKeys.

like image 38
Mark B Avatar answered Oct 03 '22 11:10

Mark B