Let's say I have a user table with id
and timestamp
attributes. I would like to be able to query on both parameters. If I understand the documentation correctly, there are two ways of doing this with DynamoDB:
id
as the hash and timestamp
as the range.id
and define a global secondary index using timestamp
.What are the benefits and drawbacks of each approach?
Global secondary index — An index with a partition key and a sort key that can be different from those on the base table. A global secondary index is considered "global" because queries on the index can span all of the data in the base table, across all partitions.
A global secondary index has no size limitations and has its own provisioned throughput settings for read and write activity that are separate from those of the table. Local secondary index—An index that has the same partition key as the base table, but a different sort key.
DynamoDB supports two different kinds of indexes: Global secondary indexes – The primary key of the index can be any two attributes from its table. Local secondary indexes – The partition key of the index must be the same as the partition key of its table. However, the sort key can be any other attribute.
The sort key of an item is also known as its range attribute. The term range attribute derives from the way DynamoDB stores items with the same partition key physically close together, in sorted order by the sort key value. Each primary key attribute must be a scalar (meaning that it can hold only a single value).
Define a hash+range primary key using id as the hash and timestamp as the range.
By making id
the Hash Key
and timestamp
the Range Key
, you are effectively creating a 'composite primary key'.
In order words, your DynamoDB schema would allow the following data (notice that 'john' is repeated three times)
id (Hash) | timestamp (Range)
----------|-------------------------
john | 2014-04-28T07:53:29.000Z
john | 2014-04-28T08:53:29.000Z
john | 2014-04-28T09:53:29.000Z
mary | 2014-04-28T07:53:29.000Z
jane | 2014-04-28T07:53:29.000Z
And you can perform these operations:
GetItem
to get a single item based on the id
(Hash Key) + timestamp
(Range Key) combinationQuery
to get a list of all items equal to the id
(Hash Key)If this is not what you intended for, then hash + range on id
and timestamp
respectively is not what you are looking for.
Define a hash-only primary key using id and define a global secondary index using timestamp.
Using a hash-only primary key on id
, id
must be unique.
id (Hash) | timestamp (GSI Hash Key)
----------|-------------------------
john | 2014-04-28T07:53:29.000Z
mary | 2014-04-28T07:53:29.000Z
jane | 2014-04-28T07:53:29.000Z
Then by applying GSI
hash-only on timestamp
, you would be able to query for a list of ids
for a particular timestamp
.
The benefits to this approach is that, it is definitely the correct solution for your use case. #1 is misuse of range key (unless you are intending to ensure at the application level id
is not duplicated which is probably a bad idea).
The drawback to using GSI
are:
GSI
per table, so choose wisely what you want indexedGSI
per table, and can further raise this soft limit through a request https://aws.amazon.com/about-aws/whats-new/2018/12/amazon-dynamodb-increases-the-number-of-global-secondary-indexes-and-projected-index-attributes-you-can-create-per-table/
GSI
will cost you additional money as you will need to assign Provisioned Throughput to it.GSI
is eventually consistent, meaning that DynamoDB does not guarantee that the moment data associated to the table's hash key is written into DB, the data's GSI
hash key immediately becomes available for querying. DynamoDB doc states that this is usually immediate, but can be the case that it could take up to seconds for the GSI
hash key to become available.GetItem
on a GSI
to obtain an item based on its Hash Key
/ Hash Key
+ Range Key
. You are limited to use Query
which returns a List
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With