Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Single or Multiple Entities Per Collection in DocumentDB

Tags:

Should there be one entity per collection in document DB?

Consider I have foreign key relationship in below diagram: enter image description here

Should I create two collections one for employee & other for company. Or should I store them into a single collection?

I read here that in documentdb scope of stored procedures triggers etc are within a collection. So by splitting differetn entities into separate collection I loose out of box functionality.

So wouldn't it be better to dump both the classes as single entity as below:

{   "Id": 1001,   "Industry": "Software",   "Employees": [     {       "Id": 10011,       "Name": "John Doe",       "CompanyId": 1001     },     {       "Id": 10012,       "Name": "Jane Doe",       "CompanyId": 1001     }   ] } 

What is the standard practise of implementing related entities in DocumentDB?

like image 283
Abhijeet Avatar asked Dec 13 '14 07:12

Abhijeet


Video Answer


2 Answers

It is generally good to store multiple entity types per collection. Whether to store entity types within to a single document or not takes a bit more thought.

As David mentioned - how to model data is a bit subjective.

Storing Multiples Entity Types within a Collection

First... let's talk about storing multiple entities in a collection. DocumentDB collections are not tables. Collections do not enforce schema; in other words, you can store different types of documents with different schemas in the same collection. You can track different types of entities simply by adding a type attribute to your document.

You should think of Collections as a unit of partition and boundary for the execution of queries and transactions. Thus a huge perk for storing different entity types within the same collection is you get transaction support right out of the box via sprocs.

Storing Multiple Entity Types within a Document

Whether you store multiple entity types within a single document takes a bit more thought. This is commonly referred to de-normalizing (capturing relationships between data by embedding data in a single document) and normalizing (capturing relationships between data by creating weak links to o other documents) your data.

Typically de-normalizing provides better read performance.

The application may need to issue fewer queries and updates to complete common operations.

In general, use de-normalized data models when:

  • have “contains” relationships between entities
  • have one-to-few relationships between entities
  • de-normalized data changes infrequently
  • de-normalized data won’t grow without bound
  • de-normalized data is integral to data in document

Example of a de-normalized data model:

{   "Id": 1001,   "Type": "Company",   "Industry": "Software",   "Employees": [     {       "Id": 10011,       "Type": "Employee",       "Name": "John Doe"     },     {       "Id": 10012,       "Type": "Employee",       "Name": "Jane Doe"     }   ] } 

Typically normalizing provides better write performance.

Provides more flexibility than de-normalizing

Client-side applications must issue follow-up queries to resolve the references. In other words, normalized data models can require more round trips to the server.

In general, use normalized data models:

  • when de-normalizing would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
  • representing one-to-many relationships
  • represent many-to-many relationships.
  • related data changes frequently

Example of a normalized data model:

{   "Id": 1001,   "Type": "Company",   "Industry": "Software" }  {   "Id": 10011,   "Type": "Employee",   "Name": "John Doe",   "CompanyId": 1001 }  {   "Id": 10012,   "Type": "Employee",   "Name": "Jane Doe",   "CompanyId": 1001 } 

Hybrid Approaches

Choosing between normalizing and de-normalizing doesn't have to be a black and white choice. I've often found that a winning design pattern is a hybrid approach, in which you may choose to normalize a partial set of an object's fields and de-normalize the others.

In other words, you could choose to de-normalize frequently read stable (or immutable) properties to reduce the need for follow up queries, while normalize frequently written / mutating fields to reduce the need for fanning out writes.

Example of a hybrid approach:

// Author documents: [{   "id": 1,   "firstName": "Thomas",   "lastName": "Andersen",   "countOfBooks": 3,   "books": [1, 2, 3],   "images": [{     "thumbnail": "http://....png"   }, {     "profile": "http://....png"   }, {     "large": "http://....png"   }] }, {   "id": 2,   "firstName": "William",   "lastName": "Wakefield",   "countOfBooks": 1,   "books": [1, 4, 5],   "images": [{     "thumbnail": "http://....png"   }] }]  // Book documents: [{   "id": 1,   "name": "DocumentDB 101",   "authors": [{     "id": 1,     "name": "Thomas Andersen",     "thumbnailUrl": "http://....png"   }, {     "id": 2,     "name": "William Wakefield",     "thumbnailUrl": "http://....png"   }] }, {   "id": 2,   "name": "DocumentDB for RDBMS Users",   "authors": [{     "id": 1,     "name": "Thomas Andersen",     "thumbnailUrl": "http://....png"   }, ] }] 
like image 119
Andrew Liu Avatar answered Sep 29 '22 10:09

Andrew Liu


Your question is a bit subjective, as you're asking for entity design, and for that, there's no single right answer.

However: From a more objective standpoint: There's nothing stopping you from having multiple entity types within a collection (e.g. Company document type and Employee document type, in your case).

You'd need to include some type of hint for yourself (maybe a type property) to help differentiate between the two when running your queries. But, by having both types within the same collection, you now have a collection-scope to work within. Regarding a type property: Since DocumentDB indexes all properties by default, a type property would be easy to integrate into your queries.

EDIT Removed part about 3-collections-per-capacity-unit, as that arrangement was removed when DocumentDB shifted from Preview to Production.

like image 25
David Makogon Avatar answered Sep 29 '22 10:09

David Makogon