I am migrating my persistence tier from Riak to DynamoDB. My data model contains an optional business identifier field, which is desired to be able to be queried as an alternative to the key.
It appears that DynamoDB secondary indexes can't be null
and require a range key, so despite the similar name to Riak's secondary indexes, make this appear quite a different beast.
Is there an elegant way to efficiently query my optional field, short of throwing the data in an external search index?
DynamoDB supports two types of secondary indexes: Global secondary index — An index with a partition key and a sort key that can be different from those on the base table. A global secondary index is considered "global" because queries on the index can span all of the data in the base table, across all partitions.
Secondary indexes are a critical part of modeling your data in DynamoDB. With a secondary index, you can add additional access patterns to your application without the hassle of maintaining multiple copies of the same data. DynamoDB will handle all replication from the base table to your secondary index.
In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique.
Index limits per table. You may create 20 global secondary indexes and 5 local secondary indexes per table.
When you asked this question, DynamoDB did not have Global Secondary Indexes: http://aws.amazon.com/about-aws/whats-new/2013/12/12/announcing-amazon-dynamodb-global-secondary-indexes/
Now, it does.
A local secondary index is best thought of, and functions as, a secondary range key. @andreimarinescu is right: you still must query by the item's hash key, only with a secondary index you can use a limited subset of a DynamoDB query's comparison operators on that range key (e.g. greater than, equal to, less than, etc.) So, you still need to know which "hash bucket" you're performing the comparison within.
Global secondary indexes are a bit of a different beast. They are more like a secondary version of your table (and Amazon charges you similarly in terms of provisioned throughput). You can use non-primary key attributes of your table as primary key attributes of your index in a global secondary index, and query them accordingly.
For example, if your table looks like:
|**Hash key**: Item ID | **Range Key**: Serial No | **Attribute**: Business ID |
--------------------------------------------------------------------------------
| 1 | 12345 | 1A |
--------------------------------------------------------------------------------
| 2 | 45678 | 2B |
--------------------------------------------------------------------------------
| 3 | 34567 | (empty) |
--------------------------------------------------------------------------------
| 3 | 12345 | 2B |
--------------------------------------------------------------------------------
Then, with a local secondary index on Business ID
you could perform queries like, "find all the items with a hash key of 3
and a business ID equal to 2B
", but you could not do "find all items with a business ID equal to 2B
" because the secondary index requires a hash key.
If you were to add a global secondary index using business ID, then you could perform such queries. You would essentially be providing an alternate primary key for the table. You could perform a query like "find all items with a business ID equal to 2B
and get items 2-45678
and 3-12345
as a response.
Sparse indexes work fine with DynamoDB; it's perfectly allowable that not all the items have a business ID and can allow you to keep the provisioned throughput on your index lower than that of the table depending on how many items you anticipate having a business ID.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With