Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are local secondary indexes only allowed on a hash and range key (not on just a hash?)

In creating a DynamoDB table in the console, why is the "local secondary index" option not available if you choose a hash primary key (rather than a hash and range)?

My use case would be to store an activity feed for each user, so a hash on userid would be logical. Additionally I'd like to keep a local secondary index range on date_created so that I can always query for the most recent n records.

Should I be using a primary key of uid and date_created in this case (though technically it is theoretically possible that two items would have the same date_created.

like image 900
MonkeyBonkey Avatar asked Jul 24 '15 19:07

MonkeyBonkey


People also ask

What is local secondary index in DynamoDB?

A local secondary index maintains an alternate sort key for a given partition key value. A local secondary index also contains a copy of some or all of the attributes from its base table. You specify which attributes are projected into the local secondary index when you create the table.

Can we add local secondary index DynamoDB?

You cannot add a local secondary index to an existing table. It must be provided at creation. This is different than global secondary indexes.

How many secondary indexes are allowed in DynamoDB?

Each table in DynamoDB can have up to 20 global secondary indexes (default quota) and 5 local secondary indexes. For more information about the differences between global secondary indexes and local secondary indexes, see Improving data access with secondary indexes.

How many secondary indexes are allowed per table?

For maximum query flexibility, you can create up to 20 global secondary indexes (default quota) and up to 5 local secondary indexes per table.


Video Answer


1 Answers

Use a global secondary index.

First off, time series data with DynamoDB is hard, but not impossible. It sounds like you want a way to get records with the most recent date_created globally across the entire table. The way to think about GSIs in DynamoDB is they are like their own table without the restriction of HASH/RANGE key combinations being unique.

With a global secondary index you can define your own hash key and range key on any other field and the combination doesn't need to be unique. You will want the hash key to be something like 'YYYY-MM', or 'YYYY-MM-DD' or 'YYYY-MM-DD-HH' (the first part of the date) depending on how many records and what type of performance you need. You then have the full date as the range key and project the attributes you need (the fewer the better, again depending on use case). The reason we break this up is to avoid hot stops in the database.

Now when you want to query the most recent items you first need to know the first part of the date you want to dive into and then the query will return the records in sorted order.

The reason this is complicated with DynamoDB is because it is a NoSQL based system. Behind the scenes DynamoDB automatically shards data horizontally across more hardware as the size of the data and number of IOPS required increases.

The approach I described above will work, but if you have a very large data size or require a very high number of IOPS (more than 1000 writes) you may want to look into using a different technology. While DynamoDB allows you to provision essentially unlimited reads and writes, it is possible to construct GSIs that limit your performance as described by the following in the DynamoDB documentation.

Consequently, to achieve the full amount of request throughput you have provisioned for a table, keep your workload spread evenly across the hash key values. Distributing requests across hash key values distributes the requests across partitions.

like image 135
JaredHatfield Avatar answered Nov 06 '22 13:11

JaredHatfield