I am building a forum and investigating modeling the data with DynamoDB and adjacency lists. Some top-level entities (like users) might have multiple types of relationships with other top-level entities (like comments).
For example, let's say we want be able to do the following:
So, we essentially have a many-to-many (user <=> comment) to many (like or follow).
Note: This example is deliberately stripped down, and in practice there will be many more relationships to model, so i'm trying to think of something extensible here.
The following top-level data would likely be common in any adjacency list representation:
First_id(Partition key) Second_id(Sort Key) Data
------------- ---------- ------
User-Harry User-Harry User data
User-Ron User-Ron User data
User-Hermione User-Hermione User data
Comment-A Comment-A Comment data
Comment-B Comment-B Comment data
Comment-C Comment-C Comment data
Furthermore, for each table below, there would be an equivalent Global Secondary Index with the partition and sort keys swapped.
This is what I would like to model in DynamoDB:
Use a third attribute to define the type of relationship:
First_id(Partition key) Second_id(Sort Key) Data
------------- ---------- ------
Comment-A User-Harry "LIKES"
Comment-B User-Harry "LIKES"
Comment-A User-Harry "FOLLOWS"
Comment-B User-Ron "LIKES"
Comment-C User-Hermione "FOLLOWS"
The downside to this approach is that there is redundant information in query results, because they will return extra items you maybe don't care about. For example, if you want to query all the users that like a given comment, you're also going to have to process all the users that follow a that given comment. Likewise, if you want to query all the comments that a user likes, you need to process all the comments that a user follows.
Modify the keys to represent the relationship:
First_id(Partition key) Second_id(Sort Key)
------------- ----------
LikeComment-A LikeUser-Harry
LikeComment-B LikeUser-Harry
FollowComment-A FollowUser-Harry
LikeComment-B LikeUser-Ron
FollowComment-C FollowUser-Hermione
This makes it efficient to query independently:
The downside is that the same top-level entity now has multiple keys, which might make things complex as more relationships are added.
Skip adjacency lists altogether and use separate tables, maybe one for Users
, one for Likes
, and one for Follows
.
Traditional relational database. While I'm not planning on going this route because this is a personal project and I want to explore DynamoDB, if this is the right way to think about things, I'd love to hear why.
Thanks for reading this far! If there is anything I can do to simplify the question or clarify anything, please let me know :)
I've looked at the AWS best practices and this many-to-many SO post and neither appears to address the many-to-many (with many) relationship, so any resources or guidance greatly appreciated.
Except for Scan , DynamoDB API operations require an equality operator (EQ) on the partition key for tables and GSIs. As a result, the partition key must be something that is easily queried by your application with a simple lookup. An example is using key=value , which returns either a unique item or fewer items.
If the table has only a partition key, then no two items can have the same partition key value.
Partition key – A simple primary key, composed of one attribute known as the partition key. DynamoDB uses the partition key's value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored.
Adjacency lists are a design pattern that is useful for modeling many-to-many relationships in Amazon DynamoDB. More generally, they provide a way to represent graph data (nodes and edges) in DynamoDB.
Your Option 1 is not possible because it does not have unique primary keys. In your sample data, you can see that you have two entries for (Comment-A, User-Harry)
.
Solution 1
The way to implement what you are looking for is by using slightly different attributes for your table and the GSI. If Harry likes Comment A, then your attributes should be:
hash_key: User-Harry
gsi_hash_key: Comment-A
sort_key_for_both: Likes-User-Harry-Comment-A
Now you have only one partition key value for your top level entities in both the table and the GSI, and you can query for a specific relationship type by using the begins_with
operator.
Solution 2
You could make the relationship a top-level entity. For example, Likes-User-Harry-Comment-A
would have two entries in the database because it is “adjacent to” both User-Harry
and Comment A
.
This allows you flexibility if you want to model more complex information about the relationships in the future (including the ability to describe the relationship between relationships, such as Likes-User-Ron-User-Harry
Causes
Follows-User-Ron-User-Harry
).
However, this strategy requires more items to be stored in the database, and it means that saving a “like” (so that it can be queried) is not an atomic operation. (But you can work around that by only writing the relationship entity, and then use DynamoDBStreams + Lambda to write entries for two entries I mentioned at the beginning of this solution.)
Update: using DynamoDB Transactions, saving a "like" in this manner can actually be a fully ACID operation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With