Let's say I have a user table with <code>id</code> and <code>timestamp</code> attributes. I would like to be able to query on both parameters. If I understand the documentation correctly, there are two ways of doing this with DynamoDB: <ol> <li>Define a hash+range primary key using <code>id</code> as the hash and <code>timestamp</code> as the range.</li> <li>Define a hash-only primary key using <code>id</code> and define a global secondary index using <code>timestamp</code>.</li> </ol> What are the benefits and drawbacks of each approach?

<blockquote> Define a hash+range primary key using id as the hash and timestamp as the range. </blockquote> By making <code>id</code> the <code>Hash Key</code> and <code>timestamp</code> the <code>Range Key</code>, you are effectively creating a 'composite primary key'. In order words, your DynamoDB schema would allow the following data (notice that 'john' is repeated three times) <pre class="prettyprint"><code>id (Hash) | timestamp (Range) ----------|------------------------- john | 2014-04-28T07:53:29.000Z john | 2014-04-28T08:53:29.000Z john | 2014-04-28T09:53:29.000Z mary | 2014-04-28T07:53:29.000Z jane | 2014-04-28T07:53:29.000Z </code></pre> And you can perform these operations: <ol> <li> <code>GetItem</code> to get a single item based on the <code>id</code> (Hash Key) + <code>timestamp</code> (Range Key) combination</li> <li> <code>Query</code> to get a list of all items equal to the <code>id</code> (Hash Key)</li> </ol> If this is not what you intended for, then hash + range on <code>id</code> and <code>timestamp</code> respectively is not what you are looking for. <blockquote> Define a hash-only primary key using id and define a global secondary index using timestamp. </blockquote> Using a hash-only primary key on <code>id</code>, <code>id</code> must be unique. <pre class="prettyprint"><code>id (Hash) | timestamp (GSI Hash Key) ----------|------------------------- john | 2014-04-28T07:53:29.000Z mary | 2014-04-28T07:53:29.000Z jane | 2014-04-28T07:53:29.000Z </code></pre> Then by applying <code>GSI</code> hash-only on <code>timestamp</code>, you would be able to query for a list of <code>ids</code> for a particular <code>timestamp</code>. The benefits to this approach is that, it is definitely the correct solution for your use case. #1 is misuse of range key (unless you are intending to ensure at the application level <code>id</code> is not duplicated which is probably a bad idea). The drawback to using <code>GSI</code> are: <ol> <li> <s>There can only be a maximum of 5 <code>GSI</code> per table, so choose wisely what you want indexed</s> DynamoDB Update Dec 2019 - You can now create as many as 20 <code>GSI</code> per table, and can further raise this soft limit through a request https://aws.amazon.com/about-aws/whats-new/2018/12/amazon-dynamodb-increases-the-number-of-global-secondary-indexes-and-projected-index-attributes-you-can-create-per-table/ </li> <li> <code>GSI</code> will cost you additional money as you will need to assign Provisioned Throughput to it.</li> <li> <code>GSI</code> is eventually consistent, meaning that DynamoDB does not guarantee that the moment data associated to the table's hash key is written into DB, the data's <code>GSI</code> hash key immediately becomes available for querying. DynamoDB doc states that this is usually immediate, but can be the case that it could take up to seconds for the <code>GSI</code> hash key to become available.</li> <li>You cannot perform <code>GetItem</code> on a <code>GSI</code> to obtain an item based on its <code>Hash Key</code> / <code>Hash Key</code> + <code>Range Key</code>. You are limited to use <code>Query</code> which returns a <code>List</code> </li> </ol>

DynamoDB: range vs. global secondary index

Tags:

database

amazon-dynamodb

Let's say I have a user table with id and timestamp attributes. I would like to be able to query on both parameters. If I understand the documentation correctly, there are two ways of doing this with DynamoDB:

Define a hash+range primary key using id as the hash and timestamp as the range.
Define a hash-only primary key using id and define a global secondary index using timestamp.

What are the benefits and drawbacks of each approach?

631

asked Apr 24 '14 18:04

David Jones

1 Answers

Define a hash+range primary key using id as the hash and timestamp as the range.

By making id the Hash Key and timestamp the Range Key, you are effectively creating a 'composite primary key'.

In order words, your DynamoDB schema would allow the following data (notice that 'john' is repeated three times)

id (Hash) | timestamp (Range)
----------|-------------------------
john      | 2014-04-28T07:53:29.000Z
john      | 2014-04-28T08:53:29.000Z
john      | 2014-04-28T09:53:29.000Z
mary      | 2014-04-28T07:53:29.000Z
jane      | 2014-04-28T07:53:29.000Z

And you can perform these operations:

GetItem to get a single item based on the id (Hash Key) + timestamp (Range Key) combination
Query to get a list of all items equal to the id (Hash Key)

If this is not what you intended for, then hash + range on id and timestamp respectively is not what you are looking for.

Define a hash-only primary key using id and define a global secondary index using timestamp.

Using a hash-only primary key on id, id must be unique.

id (Hash) | timestamp (GSI Hash Key)
----------|-------------------------
john      | 2014-04-28T07:53:29.000Z
mary      | 2014-04-28T07:53:29.000Z
jane      | 2014-04-28T07:53:29.000Z

Then by applying GSI hash-only on timestamp, you would be able to query for a list of ids for a particular timestamp.

The benefits to this approach is that, it is definitely the correct solution for your use case. #1 is misuse of range key (unless you are intending to ensure at the application level id is not duplicated which is probably a bad idea).

The drawback to using GSI are:

~~There can only be a maximum of 5 GSI per table, so choose wisely what you want indexed~~ DynamoDB Update Dec 2019 - You can now create as many as 20 GSI per table, and can further raise this soft limit through a request https://aws.amazon.com/about-aws/whats-new/2018/12/amazon-dynamodb-increases-the-number-of-global-secondary-indexes-and-projected-index-attributes-you-can-create-per-table/
GSI will cost you additional money as you will need to assign Provisioned Throughput to it.
GSI is eventually consistent, meaning that DynamoDB does not guarantee that the moment data associated to the table's hash key is written into DB, the data's GSI hash key immediately becomes available for querying. DynamoDB doc states that this is usually immediate, but can be the case that it could take up to seconds for the GSI hash key to become available.
You cannot perform GetItem on a GSI to obtain an item based on its Hash Key / Hash Key + Range Key. You are limited to use Query which returns a List

106

answered Oct 18 '22 14:10

Oh Chin Boon

Related questions
                            
                                Access sometimes jumps to existing record on save new record - Access2k FE/SQL2005 BE
                            
                                Best practice? open and close multi connections, or one large open connection for ado.net
                            
                                What is the socket declaration for, in Ruby on Rails database.yml?
                            
                                MongoDB, conditional upserts or updates
                            
                                How to create asp.net web application using sqlite [closed]
                            
                                No password prompt for postgresql superuser
                            
                                Creating a search form in PHP to search a database? [duplicate]
                            
                                Test driven development to check database queries involved methods
                            
                                com.mongodb.MongoTimeoutException: Timed out after 10000 ms while waiting to connect
                            
                                Rails: rake db:structure:load times out on CircleCI 2.0
                            
                                Good Resources for Relational Database Design [closed]
                            
                                Updating an auto_now DateTimeField in a parent model in Django
                            
                                Implementing Model-level caching
                            
                                Ideal database for geo (map) data
                            
                                what is meant by the concurrent execution of database transactions in a multiuser system? why concurrency control is needed?
                            
                                SSIS - Source Error Output (No rows will be sent to error output(s)....)
                            
                                JPA @ManyToMany join table indexing
                            
                                How to connect to mysql from C# over SSH
                            
                                How to SELECT in Oracle using a DBLINK located in a different schema?
                            
                                Three dimensional database table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With