Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DynamoDB - Global Secondary Index on set items

I have a dynamo table with the following attributes :

  • id (Number - primary key )
  • title (String)
  • created_at (Number - long)
  • tags (StringSet - contains a set of tags say android, ios, etc.,)

I want to be able to query by tags - get me all the items tagged android. How can I do that in DynamoDB? It appears that global secondary index can be built only on ScalarDataTypes (which is Number and String) and not on items inside a set.

If the approach I am taking is wrong, an alternative way for doing it either by creating different tables or changing the attributes is also fine.

like image 466
500865 Avatar asked Nov 06 '15 01:11

500865


People also ask

Can you add a global secondary index to an existing table?

To add a global secondary index to an existing table, use the UpdateTable operation with the GlobalSecondaryIndexUpdates parameter. You must provide the following: An index name. The name must be unique among all the indexes on the table.

Do global secondary indexes need to be unique?

In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique.

How many global secondary indexes are allowed per table?

For maximum query flexibility, you can create up to 20 global secondary indexes (default quota) and up to 5 local secondary indexes per table.


3 Answers

  • DynamoDB is not designed to optimize indexing on set values. Below is a copy of the amazon's relevant documentation (from Improving Data Access with Secondary Indexes in DynamoDB).

The key schema for the index. Every attribute in the index key schema must be a top-level attribute of type String, Number, or Binary. Nested attributes and multi-valued sets are not allowed. Other requirements for the key schema depend on the type of index: For a global secondary index, the hash attribute can be any scalar table attribute. A range attribute is optional, and it too can be any scalar table attribute. For a local secondary index, the hash attribute must be the same as the table's hash attribute, and the range attribute must be a non-key table attribute.

  • Amazon recommends creating a separate one-to-many table for these kind of problems. More info here : Use one to many tables
like image 186
500865 Avatar answered Oct 09 '22 14:10

500865


This is a really old post, sorry to revive it, but I'd take a look at "Single Table Design"

Basically, stop thinking about your data as structured data - embrace denormalization

id (Number - primary key ) title (String) created_at (Number - long) tags (StringSet - contains a set of tags say android, ios, etc.,)

Instead of a nosql table with a "header" of this:
id|title|created_at|tags

think of it like this:

pk|sk    |data....
id|id    |{title, created_at}
id|id+tag|{id, tag} <- create one record per tag

You can still return everything by querying for pk=id & sk begins with id and join the tags to the id records in your app logic

and you can use a GSI to project id|id+tag into tag|id which will still require you to write two queries against your data to get items of a given tag (get the ids then get the items), but you won't have to duplicate your data, you wont have to scan and you'll still be able to get your items in one query when your access pattern doesn't rely on tags.

FWIW I'd start by thinking about all of your access patterns, and from there think about how you can structure composite keys and/or GSIs

cheers

like image 45
Schalton Avatar answered Oct 09 '22 14:10

Schalton


You will need to create a separate table for this query. If you are interested in fetching all items based on a tag then I suggest keeping a table with a primary key:
hash: tag
range: id

This way you can use a very simple Query to fetch all items by tag.

like image 24
Chen Harel Avatar answered Oct 09 '22 13:10

Chen Harel