Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optional secondary indexes in DynamoDB

I am migrating my persistence tier from Riak to DynamoDB. My data model contains an optional business identifier field, which is desired to be able to be queried as an alternative to the key.

It appears that DynamoDB secondary indexes can't be null and require a range key, so despite the similar name to Riak's secondary indexes, make this appear quite a different beast.

Is there an elegant way to efficiently query my optional field, short of throwing the data in an external search index?

like image 209
nullPainter Avatar asked Sep 10 '13 09:09

nullPainter


People also ask

How many types of secondary indexes does DynamoDB support?

DynamoDB supports two types of secondary indexes: Global secondary index — An index with a partition key and a sort key that can be different from those on the base table. A global secondary index is considered "global" because queries on the index can span all of the data in the base table, across all partitions.

What are secondary indexes in DynamoDB?

Secondary indexes are a critical part of modeling your data in DynamoDB. With a secondary index, you can add additional access patterns to your application without the hassle of maintaining multiple copies of the same data. DynamoDB will handle all replication from the base table to your secondary index.

Do secondary indexes need to be unique?

In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique.

How many secondary indexes are allowed per table?

Index limits per table. You may create 20 global secondary indexes and 5 local secondary indexes per table.


1 Answers

When you asked this question, DynamoDB did not have Global Secondary Indexes: http://aws.amazon.com/about-aws/whats-new/2013/12/12/announcing-amazon-dynamodb-global-secondary-indexes/

Now, it does.

A local secondary index is best thought of, and functions as, a secondary range key. @andreimarinescu is right: you still must query by the item's hash key, only with a secondary index you can use a limited subset of a DynamoDB query's comparison operators on that range key (e.g. greater than, equal to, less than, etc.) So, you still need to know which "hash bucket" you're performing the comparison within.

Global secondary indexes are a bit of a different beast. They are more like a secondary version of your table (and Amazon charges you similarly in terms of provisioned throughput). You can use non-primary key attributes of your table as primary key attributes of your index in a global secondary index, and query them accordingly.

For example, if your table looks like:

|**Hash key**: Item ID | **Range Key**: Serial No | **Attribute**: Business ID |
--------------------------------------------------------------------------------
|           1          |        12345             |             1A             |
--------------------------------------------------------------------------------    
|           2          |        45678             |             2B             |
-------------------------------------------------------------------------------- 
|           3          |        34567             |            (empty)         |
--------------------------------------------------------------------------------
|           3          |        12345             |             2B             |
--------------------------------------------------------------------------------

Then, with a local secondary index on Business ID you could perform queries like, "find all the items with a hash key of 3 and a business ID equal to 2B", but you could not do "find all items with a business ID equal to 2B" because the secondary index requires a hash key.

If you were to add a global secondary index using business ID, then you could perform such queries. You would essentially be providing an alternate primary key for the table. You could perform a query like "find all items with a business ID equal to 2B and get items 2-45678 and 3-12345 as a response.

Sparse indexes work fine with DynamoDB; it's perfectly allowable that not all the items have a business ID and can allow you to keep the provisioned throughput on your index lower than that of the table depending on how many items you anticipate having a business ID.

like image 102
rpmartz Avatar answered Oct 17 '22 06:10

rpmartz