I am completing an exercise using DynamoDB to model a many to many relationship. I need to allow a many to many relationship between posts and tags. Each post can have many tags and each tag can have many posts.
I have a primary key on id
and primary sort key on type
and then another global index on id
and data
, I added another global index on id
and type
again but I think this is redundant.
Here is what I have so far.
id(Partition key) type(Sort Key) target data
------------- ---------- ------ ------
1 post 1 cool post
tag tag tag n/a
1 tag tag orange
---------------------------------------------
---- inserting another tag will overwrite ---
---------------------------------------------
1 tag tag green
I am taking advice from this awesome talk https://www.youtube.com/watch?v=jzeKPKpucS0 and these not so awesome docs https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html
The issue I am having is that if I try to add another tag with an id
"1" and type
"tag" it will overwrite the existing tag because it would have the same composite key. What am I missing here? It seems like the suggestion is to make the primary key and sort key be the id
and type
. Should I have my type be more like "tag#orange"? In that case I could put a global index on the target
with a sort key on the type. This way I could get all posts with a certain tag by querying target = "tag" and type starts with "tag".
Just looking for some advice on handling this sort of adjacency list data with Dynamo as it seems very interesting. Thanks!
There are two types of primary keys in DynamoDB: Partition key: This is a simple primary key. If the table has only a partition key, then no two items can have the same partition key value. Composite primary key: This is a combination of partition key and sort key.
When creating a DynamoDB table, you must specify a primary key. Each item that you write into your table must include the primary key, and the primary key must uniquely identify each item. This table is storing Orders in an e-commerce application.
DynamoDB supports two types of primary keys: Partition key: A simple primary key, composed of one attribute known as the partition key. Attributes in DynamoDB are similar in many ways to fields or columns in other database systems.
The primary key uniquely identifies each item in the table, so that no two items can have the same key. DynamoDB supports two different kinds of primary keys: Partition key – A simple primary key, composed of one attribute known as the partition key.
You need a few modifications to the way you're modeling. In an adjacency-list you have two types of items:
To build this adjacency-list, you must follow two simple guidelines (which I think are missing in your example):
From what I see in your examples, you set the primary-key of your Posts and Tags as just the item ID, while you should also use its type; e.g. Post-1
or Tag-3
. In items that represent associations, I also don't see you storing the target ID.
Let's say you have:
You'd need to model this way in Dynamo:
PRIMARY-KEY | SORT-KEY | SOURCE DATA | TARGET DATA
--------------|-------------|--------------|-------------
Post-1 | Post-1 | hello world |
Post-2 | Post-2 | foo bar |
Post-3 | Post-3 | Whatever... |
Tag-1 | Tag-1 | cool |
Tag-2 | Tag-2 | awesome |
Tag-3 | Tag-3 | great |
Post-1 | Tag-1 | hello world | cool
Post-2 | Tag-1 | foo bar | cool
Post-2 | Tag-3 | foo bar | great
Tag-1 | Post-1 | cool | hello world
Tag-1 | Post-2 | cool | foo bar
Tag-3 | Post-2 | great | foo bar
Query primary-key == "Post-1" & sort-key == "Post-1"
- returns: only Post-1
Query by primary-key == "Post-2" & sort-key BEGINS_WITH "Tag-"
- returns: Tag-1 and Tag-3 associations.
Check the documentation about the begin_with key condition expression.
Query by primary_key == "Tag-1" & sort-key BEGINS_WITH "Post-"
- returns: Post-1 and Post-2 associations.
Note that, if you change the contents of a given post, you need to change the value in all association items as well.
You can also don't store the post and tag content in association items, which saves storage space. But, in this case, you'd need two queries in the example queries 2 and 3 above: one to retrieve associations, another to retrieve each source item data. Since querying is more expensive than storing data, I prefer to duplicate storage. But it really depends if your application is read-intensive or write-intensive. If read-intensive, duplicating content in associations gives you benefit of reducing read queries. If write-intensive, not duplicating content saves write queries to update associations when the source item is updated.
Hope this helps! ;)
I don't think you are missing anything. The idea is that ID is unique for the type of item. Typically you would generate a long UUID for the ID rather than using sequential numbers. Another alternative is to use the datetime you created the item, probably with an added random number to avoid collisions when items are being created.
This answer I have previously provided may help a little DynamoDB M-M Adjacency List Design Pattern
Don't remove the sort key - this wont help make your items more unique.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With