Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Global Vs Local Secondary Indexes in DynamoDB

I am still confused as to the use of Local Secondary Indexes. Please give me specific use cases for when there is a need for LSI vs GSI.

For example here, is the "GenreAlbumTitle" index supposed to be a GSI or LSI?https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey

I can't seem to get my head around the need for needing LSI because any indexes I need would be to cover the whole rows of the table, and not just specific for one partition alone. And if someone can also touch on the cost aspect because I understand LSI is cheaper (but why is it cheaper)?

Thank you all!

like image 628
Bluetoba Avatar asked Apr 28 '18 22:04

Bluetoba


2 Answers

Every item in Dynamo must have a unique primary key. The primary key is the base table index. A primary key must have a partition key and can optionally have a range key (also called a sort key). Within a partition, items are ordered by range key. Accessing items using a partition key is fast.

Secondary indexes allow you to query the table using an alternative key. A Local Secondary Index (LSI) has the same partition key as the primary key (index), but a different range key. The way to think about an LSI is that its the same data as the primary index (key), just ordered by a different attribute.

A Global Secondary Index (GSI) has a different partition key to the primary key and is therefore a different set of data.

One of the important differences between an LSI and GSI is that an LSI takes its throughput capacity from the base table, where as you have purchase GSI throughput capacity separately. Put another way, an LSI costs you nothing and a GSI incurs extra cost over your base table.

Lets have a look at the Music table example. Lets say the base table has this schema;

Artist: (Primary Key) Partition Key
SongTitle: (Primary Key) Range Key
AlbumTitle:
DateOfRelease:

This table is a list of songs. I can access all the songs for an artist really efficiently (i.e. query by Artist using the partition key). When I do this query the songs will be ordered by SongTitle. I can also access songs by Artist and SongTitle very efficiently using the unique primary key.

Now lets say I want to get all songs by an Artist but ordered by DateOfRelease. In the current schema I would need to get all the songs and then order them in my application. A good alternative would be to create a new index, with a partition key of Artist and a range key DateOfRelease. This will be a LSI because the partition key of the index (Artist) is the same as the partition key of the primary key. I do not need to purchase additional throughput capacity as this index will provision itself from the base table capacity.

Now lets say I want to access the songs by AlbumTitle, ordered by SongTitle, i.e. create lists of Albums. To do this efficiently I create a new index with partition key AlbumTitle and range key SongTitle. This is a GSI because the partition key is different to the primary key. This GSI must be provisioned separately to the base table and therefore costs extra.

In answer to your question, GenreAlbumTitle is a GSI because it has a different partition key to Music.

like image 171
F_SO_K Avatar answered Sep 18 '22 01:09

F_SO_K


There is some misconceptions about the costs of using LSI, so let me clarify here.

Using LSI is not free of charge. Just like GSI, dynamoDB needs to create and maintain additional partial copies of the table in order to quickly get the results. This maintenance of additional copies will incur additional read, write, and storage costs identical to that of GSI. (Additional cost will be written in bold). The only difference is that instead of allocating a separate pay plan, you use the same pay plan as the main table.

Before discussing the additional cost, let me again summarize what kind of information is stored in the partial copy table. The partial table copy (LSI) contains the partition key (same as original table), the sort key (a different one than the original table), and any additional projected attributes.

Original Table

Artist (Primary Key) Song title (sort key) Album Title Date Of Release
Michael Jackson Beat It Thriller December 1, 1982
Weeknd The Hills Beauty Behind the Madness May 27, 2015,

LSI

Artist (Primary Key) Album Title (sort key) Date Of Release
Michael Jackson Thriller December 1, 1982
Weeknd Beauty Behind the Madness May 27, 2015,

Projected attributes are the additional information we want to query from the LSI. I could say, "show me all of release dates of the albums by the Weeknd, ordered by the album name". As you can see, we don't care about the song title here, and is not included in our LSI projections.

Additional cost for reads:

  • You are charged for 1 Read Unit for queries that can be satisfied by using LSI table alone. Example: "Show me all of release dates of the songs by the Weeknd, ordered by the album name."

  • You are charged 1 additional Read Unit for queries that LSI doesn't know how to serve on its own, forcing it to go to the main table for help. This will cost a total of 2 Read Units. Example: "show me all of release dates AND the song titles of the songs by the Weeknd, ordered by the album name."

Additional cost for writes:

(Writes are done to the main table, with its own write costs, and changes are later propagated to LSI's)

  • If the update to the main table results in creation of a new row in the LSI => 1 additional Write Unit
  • If the update to the main table results in the key attribute of an existing row in the LSI to be updated => cost for deletion (1 unit) + creation (1 unit) = 2 additional Write Unit
  • If the update to the main table results in the non-key attribute of an existing row in the LSI to be updated => 1 additional Write Unit
  • If the update to the main table results in the deletion of an existing attribute of an existing row in the LSI => 1 additional Write Unit
  • If the update to the main table does not change any rows of LSI => 0 additional Write Unit

Additional cost for storage

  • You pay additional cost for: (size of index keys + size of projected attributes + overhead) x number of rows

As you can see, if we are not careful with LSI, extra costs can become overbearing. To minimize cost, you must:

  • Carefully consider your typical queries. Which types of information do you need?
  • There is tradeoff between read cost and storage cost. If you project every attribute to the LSI, than you will incur no extra read cost, but your storage cost will double. If you project only the key attributes, and you often fetch additional information other than the key attributes, there will be a lot of extra read costs from having to go back to the main table for help.
  • For tables that are write-heavy, expect to incur a huge uptick in the write cost. Remember, if the update to the main table updates the key attribute of an item in the LSI, you pay 2 additional write units, and for non-key attributes, 1 additional write unit.
like image 28
Martin Kim Avatar answered Sep 21 '22 01:09

Martin Kim