Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamodb: query using more than two attributes

In Dynamodb you need to specify in an index the attributes that can be used for making queries.

How can I make a query using more than two attributes?

Example using boto.

Table.create('users', 
        schema=[
            HashKey('id') # defaults to STRING data_type
        ], throughput={
            'read': 5,
            'write': 15,
        }, global_indexes=[
            GlobalAllIndex('FirstnameTimeIndex', parts=[
                HashKey('first_name'),
                RangeKey('creation_date', data_type=NUMBER),
            ],
            throughput={
                'read': 1,
                'write': 1,
            }),
            GlobalAllIndex('LastnameTimeIndex', parts=[
                HashKey('last_name'),
                RangeKey('creation_date', data_type=NUMBER),
            ],
            throughput={
                'read': 1,
                'write': 1,
            })
        ],
        connection=conn)

How can I look for users with first name 'John', last name 'Doe', and created on '3-21-2015' using boto?

like image 696
Juan Pablo Avatar asked Mar 21 '15 20:03

Juan Pablo


People also ask

How many attributes can a DynamoDB item have?

There is no limit to the number of attributes but the total item size is limited to 400kb. The maximum item size in DynamoDB is 400 KB, which includes both attribute name binary length (UTF-8 length) and attribute value lengths (again binary length).

How many items can a DynamoDB Query return?

A single Query operation can retrieve a maximum of 1 MB of data. This limit applies before any FilterExpression or ProjectionExpression is applied to the results.

How can I improve my DynamoDB Query performance?

For faster response times, design your tables and indexes so that your applications can use Query instead of Scan . (For tables, you can also consider using the GetItem and BatchGetItem APIs.) Alternatively, design your application to use Scan operations in a way that minimizes the impact on your request rate.

Can DynamoDB have multiple hash keys?

Using normal DynamoDB operations you're allowed to query either only one hash key per request (using GetItem or Query operations) or all hash keys at once (using the Scan operation).


1 Answers

Your data modeling process has to take into consideration your data retrieval requirements, in DynamoDB you can only query by hash or hash + range key.

If querying by primary key is not enough for your requirements, you can certainly have alternate keys by creating secondary indexes (Local or Global).

However, the concatenation of multiple attributes can be used in certain scenarios as your primary key to avoid the cost of maintaining secondary indexes.

If you need to get users by First Name, Last Name and Creation Date, I would suggest you to include those attributes in the Hash and Range Key, so the creation of additional indexes are not needed.

The Hash Key should contain a value that could be computed by your application and at same time provides uniform data access. For example, say that you choose to define your keys as follow:

Hash Key (name): first_name#last_name

Range Key (created) : MM-DD-YYYY-HH-mm-SS-milliseconds

You can always append additional attributes in case the ones mentioned are not enough to make your key unique across the table.

users = Table.create('users', schema=[
        HashKey('name'),
        RangeKey('created'),
     ], throughput={
        'read': 5,
        'write': 15,
     })

Adding the user to the table:

with users.batch_write() as batch:
     batch.put_item(data={
         'name': 'John#Doe',
         'first_name': 'John',
         'last_name': 'Doe',
         'created': '03-21-2015-03-03-02-3243',
     })

Your code to find the user John Doe, created on '03-21-2015' should be something like:

name_john_doe = users.query_2(
   name__eq='John#Doe',
   created__beginswith='03-21-2015'
)

for user in name_john_doe:
     print user['first_name']

Important Considerations:

i. If your query starts to get too complicated and the Hash or Range Key too long by having too many concatenated fields then definitely use Secondary Indexes. That's a good sign that only a primary index is not enough for your requirements.

ii. I mentioned that the Hash Key should provide uniform data access:

"Dynamo uses consistent hashing to partition its key space across its replicas and to ensure uniform load distribution. A uniform key distribution can help us achieve uniform load distribution assuming the access distribution of keys is not highly skewed." [DYN]

Not only the Hash Key allows to uniquely identify the record, but also is the mechanism to ensure load distribution. The Range Key (when used) helps to indicate the records that will be mostly retrieved together, therefore, the storage can also be optimized for such need.

The link below has a complete explanation about the topic:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.UniformWorkload

like image 108
b-s-d Avatar answered Oct 04 '22 20:10

b-s-d