Without JOINs, what is the right way to handle data in document databases?

Tags:

I understand that JOINs are either not possible or frowned upon in document databases. I'm coming from a relational database background and trying to understand how to handle such scenarios.

Let's say I have an Employees collection where I store all employee related information. The following is a typical employee document:

{
   "id": 1234,
   "firstName": "John",
   "lastName": "Smith",
   "gender": "Male",
   "dateOfBirth": "3/21/1967",
   "emailAddresses":[
      { "email": "[email protected]", "isPrimary": "true" },
      { "email": "[email protected]", "isPrimary": "false" }
   ]
}

Let's also say, I have a separate Projects collection where I store project data that looks something like that:

{
   "id": 444,
   "projectName": "My Construction Project",
   "projectType": "Construction",
   "projectTeam":[
      { "_id": 2345, "position": "Engineer" },
      { "_id": 1234, "position": "Project Manager" }
   ]
}

If I want to return a list of all my projects along with project teams, how do I handle making sure that I return all the pertinent information about individuals in the team i.e. full names, email addresses, etc?

Is it two separate queries? One for projects and the other for people whose ID's appear in the projects collection?

If so, how do I then insert the data about people i.e. full names, email addresses? Do I then do a foreach loop in my app to update the data?

If I'm relying on my application to handle populating all the pertinent data, is this not a performance hit that would offset the performance benefits of document databases such as MongoDB?

Thanks for your help.

290

asked Aug 29 '14 16:08

Sam

1 Answers

"...how do I handle making sure that I return all the pertinent information about individuals in the team i.e. full names, email addresses, etc? Is it two separate queries?"

It is either 2 separate queries OR you denormalize into the Project document. In our applications we do the 2nd query and keep the data as normalized as possible in the documents.

It is actually NOT common to see the "_id" key anywhere but on the top-level document. Further, for collections that you are going to have millions of documents in, you save storage by keeping the keys "terse". Consider "name" rather than "projectName", "type" rather than "projectType", "pos" rather than "position". It seems trivial but it adds up. You'll also want to put an index on "team.empId" so the query "how many projects has Joe Average worked on" runs well.

{
  "_id": 444,
  "name": "My Construction Project",
  "type": "Construction",
  "team":[
    { "empId": 2345, "pos": "Engineer" },
    { "empId": 1234, "pos": "Project Manager" }
  ]
}

Another thing to get used to is that you don't have to write the whole document every time you want to update an individual field or, say, add a new member to the team. You can do targeted updates that uniquely identify the document but only update an individual field or array element.

db.projects.update(
  { _id : 444 },
  { $addToSet : "team" : { "empId": 666, "position": "Minion" } }
);

The 2 queries to get one thing done hurts at first, but you'll get past it.

122

answered Oct 15 '22 17:10

Bob Kuhar

Related questions
                            
                                Multiple INNER JOIN with GROUP BY and Aggregate Function
                            
                                SQL query with comments import into R from file
                            
                                Now() vs GetDate()
                            
                                How to turn a huge live database into a small testing database?
                            
                                MSSQL BIT_COUNT (Hammingdistance)
                            
                                sqlConnection/Command using statement + try/catch block [duplicate]
                            
                                DQL query to return all files in a Cabinet in Documentum?
                            
                                SQL (+)= definition and function
                            
                                How to join multiple tables by date range in SQL?
                            
                                mysql grant select privilege on only one table and some columns of it
                            
                                Can not determine what the WHERE clause should be
                            
                                How to normalize data efficently while INSERTing into SQL table (Postgres)
                            
                                MySQL select all dates that are an increment of x days
                            
                                Select n amount of random rows where n is proportionate to each value's % of total population
                            
                                How can I optimize SQLite ORDER BY rowid?
                            
                                Backup and restore of Hsqldb database in java code
                            
                                Is there any formal difference at all between PostgreSQL functions with OUT parameters and with TABLE results?
                            
                                Joining arrays within group by clause
                            
                                Using case in mysql ORDER BY
                            
                                SQL How to Count Number of Specific Values in a Row

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Without JOINs, what is the right way to handle data in document databases?

Tags:

sql

mongodb

mongodb-query

document

azure-cosmosdb

Sam

People also ask

1 Answers

Bob Kuhar

Recent Activity

Donate For Us