How to structure relationships in Azure Cosmos DB?

Tags:

azure-cosmosdb

I have two sets of data in the same collection in cosmos, one are 'posts' and the other are 'users', they are linked by the posts users create.

Currently my structure is as follows;

// user document
{
id: 123,
postIds: ['id1','id2']
}

// post document
{
id: 'id1',
ownerId: 123
}
{
id: 'id2',
ownerId: 123
}

My main issue with this setup is the fungible nature of it, code has to enforce the link and if there's a bug data will very easily be lost with no clear way to recover it.

I'm also concerned about performance, if a user has 10,000 posts that's 10,000 lookups I'll have to do to resolve all the posts..

Is this the correct method for modelling entity relationships?

424

asked Dec 19 '18 14:12

1 Answers

As said by David, it's a long discussion but it is a very common one so, since I have on hour or so of "free" time, I'm more than glad to try to answer it, once for all, hopefully.

WHY NORMALIZE?

First thing I notice in your post: you are looking for some level of referential integrity (https://en.wikipedia.org/wiki/Referential_integrity) which is something that is needed when you decompose a bigger object into its constituent pieces. Also called normalization.

While this is normally done in a relational database, it is now also becoming popular in non-relational database since it helps a lot to avoid data duplication which usually creates more problem than what it solves.

https://docs.mongodb.com/manual/core/data-model-design/#normalized-data-models

But do you really need it? Since you have chosen to use JSON document database, you should leverage the fact that it's able to store the entire document and then just store the document ALONG WITH all the owner data: name, surname, or all the other data you have about the user who created the document. Yes, I’m saying that you may want to evaluate not to have post and user, but just posts, with user info inside it.This may be actually very correct, as you will be sure to get the EXACT data for the user existing at the moment of post creation. Say for example I create a post and I have biography "X". I then update my biography to "Y" and create a new post. The two post will have different author biographies and this is just right, as they have exactly captured reality.

Of course you may want to also display a biography in an author page. In this case you'll have a problem. Which one you'll use? Probably the last one.

If all authors, in order to exist in your system, MUST have blog post published, that may well be enough. But maybe you want to have an author write its biography and being listed in your system, even before he writes a blog post.

In such case you need to NORMALIZE the model and create a new document type, just for authors. If this is your case, then, you also need to figure out how to handler the situation described before. When the author will update its own biography, will you just update the author document, or create a new one? If you create a new one, so that you can keep track of all changes, will you also update all the previous post so that they will reference the new document, or not?

As you can see the answer is complex, and REALLY depends on what kind of information you want to capture from the real world.

So, first of all, figure out if you really need to keep posts and users separated.

CONSISTENCY

Let’s assume that you really want to have posts and users kept in separate documents, and thus you normalize your model. In this case, keep in mind that Cosmos DB (but NoSQL in general) databases DO NOT OFFER any kind of native support to enforce referential integrity, so you are pretty much on your own. Indexes can help, of course, so you may want to index the ownerId property, so that before deleting an author, for example, you can efficiently check if there are any blog post done by him/her that will remain orphans otherwise. Another option is to manually create and keep updated ANOTHER document that, for each author, keeps track of the blog posts he/she has written. With this approach you can just look at this document to understand which blog posts belong to an author. You can try to keep this document automatically updated using triggers, or do it in your application. Just keep in mind, that when you normalize, in a NoSQL database, keep data consistent is YOUR responsibility. This is exactly the opposite of a relational database, where your responsibility is to keep data consistent when you de-normalize it.

PERFORMANCES

Performance COULD be an issue, but you don't usually model in order to support performances in first place. You model in order to make sure your model can represent and store the information you need from the real world and then you optimize it in order to have decent performance with the database you have chose to use. As different database will have different constraints, the model will then be adapted to deal with that constraints. This is nothing more and nothing less that the good old “logical” vs “physical” modeling discussion.

In Cosmos DB case, you should not have queries that go cross-partition as they are more expensive.

Unfortunately partitioning is something you chose once and for all, so you really need to have clear in your mind what are the most common use case you want to support at best. If the majority of your queries are done on per author basis, I would partition per author.

Now, while this may seems a clever choice, it will be only if you have A LOT of authors. If you have only one, for example, all data and queries will go into just one partition, limiting A LOT your performance. Remember, in fact, that Cosmos DB RU are split among all the available partitions: with 10.000 RU, for example, you usually get 5 partitions, which means that all your values will be spread across 5 partitions. Each partition will have a top limit of 2000 RU. If all your queries use just one partition, your real maximum performance is that 2000 and not 10000 RUs.

I really hope this help you to start to figure out the answer. And I really hope this help to foster and grow a discussion (how to model for a document database) that I think it is really due and mature now.

106

answered Sep 27 '22 23:09

mauridb

Related questions
                            
                                Difference between Web API and Azure API APP in Visual Studio
                            
                                Single Azure function multiple timer trigger
                            
                                Azure AD B2C vulnerable to Open Redirect?
                            
                                SQLInjection against CosmosDB in an Azure function
                            
                                Azure function: Could not load file or assembly Microsoft.IdentityModel.Tokens, Version=5.2.1.0
                            
                                Azure service bus - message going straight to dead letter queue
                            
                                Disabling Ocp-Apim-Trace in the Azure API Management Services
                            
                                Is there a limit on the amout of Azure Blob Storage SAS keys generated per hour?
                            
                                How to set OpenIdConnect option prompt ="login" from Controller in .net core 2 application?
                            
                                An Azure App Service cannot load a pfx certificate from the wwwroot filesystem
                            
                                Login failure remote debugging ASP.NET WebAPI service hosted as an Azure App Service
                            
                                Azure Cloud Shell File Share Contains 5GB IMG File
                            
                                azure functions Authentication
                            
                                Terraform grant azure function app with msi access to azure keyvault
                            
                                How do I use databricks-cli without manual configuration
                            
                                Client address is not authorized and caller is not a trusted service in Azure
                            
                                The 'dotnet publish' command line skips deployment to Azure Functions
                            
                                Point root domain to Azure Function
                            
                                Space separated values; how to provide a value containing a space
                            
                                Redis caches - when can large evictions be triggered?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to structure relationships in Azure Cosmos DB?

Tags:

azure

azure-cosmosdb

meds

People also ask

1 Answers

mauridb

Recent Activity

Donate For Us