I am still confused as to the use of Local Secondary Indexes. Please give me specific use cases for when there is a need for LSI vs GSI.
For example here, is the "GenreAlbumTitle" index supposed to be a GSI or LSI?https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey
I can't seem to get my head around the need for needing LSI because any indexes I need would be to cover the whole rows of the table, and not just specific for one partition alone. And if someone can also touch on the cost aspect because I understand LSI is cheaper (but why is it cheaper)?
Thank you all!
Every item in Dynamo must have a unique primary key. The primary key is the base table index. A primary key must have a partition key and can optionally have a range key (also called a sort key). Within a partition, items are ordered by range key. Accessing items using a partition key is fast.
Secondary indexes allow you to query the table using an alternative key. A Local Secondary Index (LSI) has the same partition key as the primary key (index), but a different range key. The way to think about an LSI is that its the same data as the primary index (key), just ordered by a different attribute.
A Global Secondary Index (GSI) has a different partition key to the primary key and is therefore a different set of data.
One of the important differences between an LSI and GSI is that an LSI takes its throughput capacity from the base table, where as you have purchase GSI throughput capacity separately. Put another way, an LSI costs you nothing and a GSI incurs extra cost over your base table.
Lets have a look at the Music table example. Lets say the base table has this schema;
Artist: (Primary Key) Partition Key
SongTitle: (Primary Key) Range Key
AlbumTitle:
DateOfRelease:
This table is a list of songs. I can access all the songs for an artist really efficiently (i.e. query by Artist using the partition key). When I do this query the songs will be ordered by SongTitle. I can also access songs by Artist and SongTitle very efficiently using the unique primary key.
Now lets say I want to get all songs by an Artist but ordered by DateOfRelease. In the current schema I would need to get all the songs and then order them in my application. A good alternative would be to create a new index, with a partition key of Artist and a range key DateOfRelease. This will be a LSI because the partition key of the index (Artist) is the same as the partition key of the primary key. I do not need to purchase additional throughput capacity as this index will provision itself from the base table capacity.
Now lets say I want to access the songs by AlbumTitle, ordered by SongTitle, i.e. create lists of Albums. To do this efficiently I create a new index with partition key AlbumTitle and range key SongTitle. This is a GSI because the partition key is different to the primary key. This GSI must be provisioned separately to the base table and therefore costs extra.
In answer to your question, GenreAlbumTitle
is a GSI because it has a different partition key to Music
.
There is some misconceptions about the costs of using LSI, so let me clarify here.
Using LSI is not free of charge. Just like GSI, dynamoDB needs to create and maintain additional partial copies of the table in order to quickly get the results. This maintenance of additional copies will incur additional read, write, and storage costs identical to that of GSI. (Additional cost will be written in bold). The only difference is that instead of allocating a separate pay plan, you use the same pay plan as the main table.
Before discussing the additional cost, let me again summarize what kind of information is stored in the partial copy table. The partial table copy (LSI) contains the partition key (same as original table), the sort key (a different one than the original table), and any additional projected attributes.
Original Table
Artist (Primary Key) | Song title (sort key) | Album Title | Date Of Release |
---|---|---|---|
Michael Jackson | Beat It | Thriller | December 1, 1982 |
Weeknd | The Hills | Beauty Behind the Madness | May 27, 2015, |
LSI
Artist (Primary Key) | Album Title (sort key) | Date Of Release |
---|---|---|
Michael Jackson | Thriller | December 1, 1982 |
Weeknd | Beauty Behind the Madness | May 27, 2015, |
Projected attributes are the additional information we want to query from the LSI. I could say, "show me all of release dates of the albums by the Weeknd, ordered by the album name". As you can see, we don't care about the song title here, and is not included in our LSI projections.
Additional cost for reads:
You are charged for 1 Read Unit for queries that can be satisfied by using LSI table alone. Example: "Show me all of release dates of the songs by the Weeknd, ordered by the album name."
You are charged 1 additional Read Unit for queries that LSI doesn't know how to serve on its own, forcing it to go to the main table for help. This will cost a total of 2 Read Units. Example: "show me all of release dates AND the song titles of the songs by the Weeknd, ordered by the album name."
Additional cost for writes:
(Writes are done to the main table, with its own write costs, and changes are later propagated to LSI's)
Additional cost for storage
As you can see, if we are not careful with LSI, extra costs can become overbearing. To minimize cost, you must:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With