Should there be one entity per collection in document DB?
Consider I have foreign key relationship in below diagram:
Should I create two collections one for employee & other for company. Or should I store them into a single collection?
I read here that in documentdb scope of stored procedures triggers etc are within a collection. So by splitting differetn entities into separate collection I loose out of box functionality.
So wouldn't it be better to dump both the classes as single entity as below:
{ "Id": 1001, "Industry": "Software", "Employees": [ { "Id": 10011, "Name": "John Doe", "CompanyId": 1001 }, { "Id": 10012, "Name": "Jane Doe", "CompanyId": 1001 } ] }
What is the standard practise of implementing related entities in DocumentDB?
It is generally good to store multiple entity types per collection. Whether to store entity types within to a single document or not takes a bit more thought.
As David mentioned - how to model data is a bit subjective.
Storing Multiples Entity Types within a Collection
First... let's talk about storing multiple entities in a collection. DocumentDB collections are not tables. Collections do not enforce schema; in other words, you can store different types of documents with different schemas in the same collection. You can track different types of entities simply by adding a type attribute to your document.
You should think of Collections as a unit of partition and boundary for the execution of queries and transactions. Thus a huge perk for storing different entity types within the same collection is you get transaction support right out of the box via sprocs.
Storing Multiple Entity Types within a Document
Whether you store multiple entity types within a single document takes a bit more thought. This is commonly referred to de-normalizing (capturing relationships between data by embedding data in a single document) and normalizing (capturing relationships between data by creating weak links to o other documents) your data.
Typically de-normalizing provides better read performance.
The application may need to issue fewer queries and updates to complete common operations.
In general, use de-normalized data models when:
Example of a de-normalized data model:
{ "Id": 1001, "Type": "Company", "Industry": "Software", "Employees": [ { "Id": 10011, "Type": "Employee", "Name": "John Doe" }, { "Id": 10012, "Type": "Employee", "Name": "Jane Doe" } ] }
Typically normalizing provides better write performance.
Provides more flexibility than de-normalizing
Client-side applications must issue follow-up queries to resolve the references. In other words, normalized data models can require more round trips to the server.
In general, use normalized data models:
Example of a normalized data model:
{ "Id": 1001, "Type": "Company", "Industry": "Software" } { "Id": 10011, "Type": "Employee", "Name": "John Doe", "CompanyId": 1001 } { "Id": 10012, "Type": "Employee", "Name": "Jane Doe", "CompanyId": 1001 }
Hybrid Approaches
Choosing between normalizing and de-normalizing doesn't have to be a black and white choice. I've often found that a winning design pattern is a hybrid approach, in which you may choose to normalize a partial set of an object's fields and de-normalize the others.
In other words, you could choose to de-normalize frequently read stable (or immutable) properties to reduce the need for follow up queries, while normalize frequently written / mutating fields to reduce the need for fanning out writes.
Example of a hybrid approach:
// Author documents: [{ "id": 1, "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [{ "thumbnail": "http://....png" }, { "profile": "http://....png" }, { "large": "http://....png" }] }, { "id": 2, "firstName": "William", "lastName": "Wakefield", "countOfBooks": 1, "books": [1, 4, 5], "images": [{ "thumbnail": "http://....png" }] }] // Book documents: [{ "id": 1, "name": "DocumentDB 101", "authors": [{ "id": 1, "name": "Thomas Andersen", "thumbnailUrl": "http://....png" }, { "id": 2, "name": "William Wakefield", "thumbnailUrl": "http://....png" }] }, { "id": 2, "name": "DocumentDB for RDBMS Users", "authors": [{ "id": 1, "name": "Thomas Andersen", "thumbnailUrl": "http://....png" }, ] }]
Your question is a bit subjective, as you're asking for entity design, and for that, there's no single right answer.
However: From a more objective standpoint: There's nothing stopping you from having multiple entity types within a collection (e.g. Company
document type and Employee
document type, in your case).
You'd need to include some type of hint for yourself (maybe a type
property) to help differentiate between the two when running your queries. But, by having both types within the same collection, you now have a collection-scope to work within. Regarding a type
property: Since DocumentDB indexes all properties by default, a type
property would be easy to integrate into your queries.
EDIT Removed part about 3-collections-per-capacity-unit, as that arrangement was removed when DocumentDB shifted from Preview to Production.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With