Should DynamoDB adjacency lists use discrete partition keys to model each type of relationship?

Context

I am building a forum and investigating modeling the data with DynamoDB and adjacency lists. Some top-level entities (like users) might have multiple types of relationships with other top-level entities (like comments).

Requirements

For example, let's say we want be able to do the following:

Users can like comments
Users can follow comments
Comments can display users that like it
Comments can display users that follow it
User profiles can show comments they like
User profiles can show comments they follow

So, we essentially have a many-to-many (user <=> comment) to many (like or follow).

Note: This example is deliberately stripped down, and in practice there will be many more relationships to model, so i'm trying to think of something extensible here.

Baseline

The following top-level data would likely be common in any adjacency list representation:

First_id(Partition key)         Second_id(Sort Key)         Data
-------------                   ----------                  ------
User-Harry                      User-Harry                  User data
User-Ron                        User-Ron                    User data
User-Hermione                   User-Hermione               User data
Comment-A                       Comment-A                   Comment data
Comment-B                       Comment-B                   Comment data
Comment-C                       Comment-C                   Comment data

Furthermore, for each table below, there would be an equivalent Global Secondary Index with the partition and sort keys swapped.

Example Data

This is what I would like to model in DynamoDB:

Harry likes comment A
Harry likes comment B
Harry follows comment A
Ron likes comment B
Hermione likes comment C

Option 1

Use a third attribute to define the type of relationship:

First_id(Partition key)         Second_id(Sort Key)         Data
-------------                   ----------                  ------
Comment-A                       User-Harry                  "LIKES"
Comment-B                       User-Harry                  "LIKES"
Comment-A                       User-Harry                  "FOLLOWS"
Comment-B                       User-Ron                    "LIKES"
Comment-C                       User-Hermione               "FOLLOWS"

The downside to this approach is that there is redundant information in query results, because they will return extra items you maybe don't care about. For example, if you want to query all the users that like a given comment, you're also going to have to process all the users that follow a that given comment. Likewise, if you want to query all the comments that a user likes, you need to process all the comments that a user follows.

Option 2

Modify the keys to represent the relationship:

First_id(Partition key)         Second_id(Sort Key)
-------------                   ----------
LikeComment-A                   LikeUser-Harry
LikeComment-B                   LikeUser-Harry
FollowComment-A                 FollowUser-Harry
LikeComment-B                   LikeUser-Ron
FollowComment-C                 FollowUser-Hermione

This makes it efficient to query independently:

Comment likes
Comment follows
User likes
User follows

The downside is that the same top-level entity now has multiple keys, which might make things complex as more relationships are added.

Option 3

Skip adjacency lists altogether and use separate tables, maybe one for Users, one for Likes, and one for Follows.

Option 4

Traditional relational database. While I'm not planning on going this route because this is a personal project and I want to explore DynamoDB, if this is the right way to think about things, I'd love to hear why.

Conclusion

Thanks for reading this far! If there is anything I can do to simplify the question or clarify anything, please let me know :)

I've looked at the AWS best practices and this many-to-many SO post and neither appears to address the many-to-many (with many) relationship, so any resources or guidance greatly appreciated.

635

asked Nov 18 '18 06:11

user3268232

1 Answers

Your Option 1 is not possible because it does not have unique primary keys. In your sample data, you can see that you have two entries for (Comment-A, User-Harry).

Solution 1

The way to implement what you are looking for is by using slightly different attributes for your table and the GSI. If Harry likes Comment A, then your attributes should be:

hash_key: User-Harry
gsi_hash_key: Comment-A
sort_key_for_both: Likes-User-Harry-Comment-A

Now you have only one partition key value for your top level entities in both the table and the GSI, and you can query for a specific relationship type by using the begins_with operator.

Solution 2

You could make the relationship a top-level entity. For example, Likes-User-Harry-Comment-A would have two entries in the database because it is “adjacent to” both User-Harry and Comment A.

This allows you flexibility if you want to model more complex information about the relationships in the future (including the ability to describe the relationship between relationships, such as Likes-User-Ron-User-Harry Causes Follows-User-Ron-User-Harry).

However, this strategy requires more items to be stored in the database, and it means that saving a “like” (so that it can be queried) is not an atomic operation. (But you can work around that by only writing the relationship entity, and then use DynamoDBStreams + Lambda to write entries for two entries I mentioned at the beginning of this solution.)

Update: using DynamoDB Transactions, saving a "like" in this manner can actually be a fully ACID operation.

169

answered Oct 21 '22 04:10

Matthew Pope

Related questions
                            
                                How to use Data Pipeline to export a DynamoDB table that has on-demand provision
                            
                                How to convert from DynamoDB wire protocol to native Python object manually with boto3?
                            
                                DynamoDb how to query a Global Secondary Index?
                            
                                DynamoDB Global Secondary Index with Exclusive Start Key
                            
                                DynamoDB : Pagination with "withExclusiveStartKey" on a Global Secondary Index
                            
                                How to write from DynamoDB to ElasticSearch using Lambda?
                            
                                DynamoDB concurrent write
                            
                                How to connect elasticache and dynamoDb from aws-lambda without using NAT Gateway
                            
                                Does Spark allow to use Amazon Assumed Role and STS temporary credentials for DynamoDB?
                            
                                "type mismatch error, expected type LIST" for querying a one-to-many relationship in AppSync
                            
                                DynamoDB API: How can I build an "add JSON attribute if not present" update request?
                            
                                How do you call the data model of DynamoDB and Cassandra?
                            
                                How to get distinct count on dynamodb on billion objects?
                            
                                Automatic AWS DynamoDB to S3 export failing with "role/DataPipelineDefaultRole is invalid"
                            
                                It is possible to Access AWS DynamoDB streams accross accounts?
                            
                                How can I map a DynamoDB AttributeMap type to an interface in typescript?
                            
                                Reduce DynamoDB latency from Java Lambda
                            
                                DynamoDB Javascript – Query by primary key and array of range keys?
                            
                                How does Dynamodb Sort the query results when there is no range key avaialble?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With