Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differences between GSI and table

I am having trouble understanding what the difference is between a global secondary index and a table.

  • Why would I use a global secondary index, why not just create another table?
  • I have to specify read and write throughput for both. When a write occurs on a table with a GSI I have to write to both the table and the index. My question then is why not GSI create another table instead of a global secondary index?
  • What benefit do I get by using a GSI?
like image 685
user2924127 Avatar asked Feb 09 '23 19:02

user2924127


1 Answers

I'll take a stab at this.

One thing is that you get an eventually consistent view of the data, and it can also act as a sort of "transactional" model.

Imagine that you want to track user/group relationships. This might not be the best example, but I think it will demonstrate a few points.

Let's say your use cases are you want to be able to Query all groups for a user, and Query all users for a group. In this simple setup, you would think of having 2 tables:

  1. UsersToGroups with hash+range of userId+groupId
  2. GroupsToUsers with hash+range of groupId+userId.

If you need to make an update to any relationship a client needs to:

  1. Write to the UsersToGroups table (hash: userId, range: groupId)
  2. Write to the GroupsToUsers table (hash: groupId, range: userId)

What happens if your 2nd write fails? How do you rollback the first write if the second fails? How do you know your 2nd write fails, say if a connection failure happens?

These problems are not fun to deal with.

With a GSI, you could have a single table, depending on how you want to manage it. If instead of using 2 tables, let's say I use a single table and a single GSI.

  1. Table UsersToGroups with hash+range of userId+`groupId
  2. GSI GroupsToUsers with hash+range of groupId+userId.

If you need to make an update to any relationship a client needs to:

  1. Write to the UsersToGroups

That is it. You only have to make 1 request. If that write is successful you can guarantee that your index will (eventually) have the same data. Depending on how often you query this index, or how much data you need to propagated, you can adjust the throughput accordingly.

This simple example assumes that userIds and groupId are unique and no collisions will happen when they are projected to the index, but I think it does a good job of explaining at least some usefulness

For more information, see the Guidelines for Global Secondary Indexes documentation.

like image 179
mkobit Avatar answered Feb 12 '23 10:02

mkobit