Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DynamoDB: range vs. global secondary index

Let's say I have a user table with id and timestamp attributes. I would like to be able to query on both parameters. If I understand the documentation correctly, there are two ways of doing this with DynamoDB:

  1. Define a hash+range primary key using id as the hash and timestamp as the range.
  2. Define a hash-only primary key using id and define a global secondary index using timestamp.

What are the benefits and drawbacks of each approach?

like image 631
David Jones Avatar asked Apr 24 '14 18:04

David Jones


People also ask

What is global secondary index in DynamoDB?

Global secondary index — An index with a partition key and a sort key that can be different from those on the base table. A global secondary index is considered "global" because queries on the index can span all of the data in the base table, across all partitions.

What is the difference between a global secondary index and a local secondary index?

A global secondary index has no size limitations and has its own provisioned throughput settings for read and write activity that are separate from those of the table. Local secondary index—An index that has the same partition key as the base table, but a different sort key.

What are the two types of indexing in DynamoDB?

DynamoDB supports two different kinds of indexes: Global secondary indexes – The primary key of the index can be any two attributes from its table. Local secondary indexes – The partition key of the index must be the same as the partition key of its table. However, the sort key can be any other attribute.

What is DynamoDB range?

The sort key of an item is also known as its range attribute. The term range attribute derives from the way DynamoDB stores items with the same partition key physically close together, in sorted order by the sort key value. Each primary key attribute must be a scalar (meaning that it can hold only a single value).


1 Answers

Define a hash+range primary key using id as the hash and timestamp as the range.

By making id the Hash Key and timestamp the Range Key, you are effectively creating a 'composite primary key'.

In order words, your DynamoDB schema would allow the following data (notice that 'john' is repeated three times)

id (Hash) | timestamp (Range)
----------|-------------------------
john      | 2014-04-28T07:53:29.000Z
john      | 2014-04-28T08:53:29.000Z
john      | 2014-04-28T09:53:29.000Z
mary      | 2014-04-28T07:53:29.000Z
jane      | 2014-04-28T07:53:29.000Z

And you can perform these operations:

  1. GetItem to get a single item based on the id (Hash Key) + timestamp (Range Key) combination
  2. Query to get a list of all items equal to the id (Hash Key)

If this is not what you intended for, then hash + range on id and timestamp respectively is not what you are looking for.

Define a hash-only primary key using id and define a global secondary index using timestamp.

Using a hash-only primary key on id, id must be unique.

id (Hash) | timestamp (GSI Hash Key)
----------|-------------------------
john      | 2014-04-28T07:53:29.000Z
mary      | 2014-04-28T07:53:29.000Z
jane      | 2014-04-28T07:53:29.000Z

Then by applying GSI hash-only on timestamp, you would be able to query for a list of ids for a particular timestamp.

The benefits to this approach is that, it is definitely the correct solution for your use case. #1 is misuse of range key (unless you are intending to ensure at the application level id is not duplicated which is probably a bad idea).

The drawback to using GSI are:

  1. There can only be a maximum of 5 GSI per table, so choose wisely what you want indexed DynamoDB Update Dec 2019 - You can now create as many as 20 GSI per table, and can further raise this soft limit through a request https://aws.amazon.com/about-aws/whats-new/2018/12/amazon-dynamodb-increases-the-number-of-global-secondary-indexes-and-projected-index-attributes-you-can-create-per-table/
  2. GSI will cost you additional money as you will need to assign Provisioned Throughput to it.
  3. GSI is eventually consistent, meaning that DynamoDB does not guarantee that the moment data associated to the table's hash key is written into DB, the data's GSI hash key immediately becomes available for querying. DynamoDB doc states that this is usually immediate, but can be the case that it could take up to seconds for the GSI hash key to become available.
  4. You cannot perform GetItem on a GSI to obtain an item based on its Hash Key / Hash Key + Range Key. You are limited to use Query which returns a List
like image 106
Oh Chin Boon Avatar answered Oct 18 '22 14:10

Oh Chin Boon