Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DynamoDB Query with multiple tags

I am rather new to DynamoDB and currently we are thinking about migrating an existing project to a serverless application using DynamoDB where we want to adapt the following setup from a RDMS database:

Tables:

  • Projects (ProjectID)
  • Files (FileID, ProjectID, Filename)
  • Tags (FileID, Tag)

We want to make a query with DynamoDB to fetch all Files for a specific Project (by ProjectID) with one or multiple Tags (by Tag). In an RDMS this query would be simple with something like:

SELECT * FROM Files JOIN Tags ON Tags.FileID = Files.FileID WHERE Files.ProjectID = ?PROJECT AND Tags.Tag = ?TAG_1 OR ?TAG_2 ...

At the moment, we have the following DynamoDB setup (but it can still be changed):

  • Projects (ProjectID [HashKey], ...)
  • Files (ProjectID [HashKey], FileID [RangeKey], ...)

Please also consider that the number of project entries is huge (between 1000 - 30000) and also the number of files for each project (is between 50 and 100.000) and the query should be really fast.

How can this be achieved using DynamoDB-query, best without using filter expressions since they are applied after data selection? It would be perfect if the table Files could have a StringSet Tags as column but I guess that this cannot be used for an efficient DynamoDB-query (so without using DynamoDB-scan) since DynamoDB-indices can only be of type String, Binary and Number and not of type StringSet? Is this maybe an applicable use case for the Global Secondary Index (GSI)?

like image 553
Tom Avatar asked Mar 08 '17 13:03

Tom


People also ask

Can DynamoDB have multiple hash keys?

Using normal DynamoDB operations you're allowed to query either only one hash key per request (using GetItem or Query operations) or all hash keys at once (using the Scan operation).

Does DynamoDB support complex queries?

DynamoDB has many attractive features. For example, it can automatically scale to handle trillions of calls in a 24-hour period. It can be used as a key-value store or a document database, and it can handle complex access patterns much faster than a typical relational database.

Which is faster scan or query in DynamoDB?

For faster response times, design your tables and indexes so that your applications can use Query instead of Scan . (For tables, you can also consider using the GetItem and BatchGetItem APIs.) Alternatively, design your application to use Scan operations in a way that minimizes the impact on your request rate.

What is the difference between DynamoDB scan and query?

DynamoDB offers two ways to access information stored: Query and Scan. A Query will rely on the primary-key to find information. Query can point directly to a particular item (or set ot items) and retrieve them in a fast and efficient way. Scan, as the name suggests, will browse table items from start to finish.


1 Answers

A bit late, just saw this question referenced from another one.

I guess you've went and solved it something like this?

DynamoDB tables

  • Projects (ProjectID [HashKey], ...)
  • Files (ProjectID [HashKey], FileID [RangeKey], ...)
  • Tags (Tag [HashKey], FileID [RangeKey], ProjectID [LSI Sort Key])

On the FileTags, you need the FileID to make the primary key unique, but you can add the ProjectID as a sort key for a Local Secondary Index, so you can search on Tag + ProjectID.

It's some sort of Data Denormalization, but that's what it takes to go NoSQL :-( . E.g. if your File would be switched to another Project, you'll need to update the ProjectID not only on the File, but also on all the Tags.

like image 67
GeertPt Avatar answered Sep 28 '22 08:09

GeertPt